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Domain  Decomposition  Strategies  for  Solving  the  Maxwell  Equations 

on  Distributed  Parallel  Architectures 

Douglas  C.  Blake^  and  Thomas  A.  Buter§ 

Air  Force  Institute  of  Technology,  Wright-Patterson  Air  Force  Base,  OH  45433 


Abstract 

Domain  decomposition  strategies  for  solving  hyperbolic  sys¬ 
tems  of  partial  differential  equations  on  distributed-memoiy 
parallel  computing  platforms  are  investigated.  The  logically- 
rectangular  computational  domain  is  divided  either  one,  two, 
or  three  dimensionally  into  a  series  of  computational  blocks, 
and  each  block  is  assigned  to  a  single  processor.  Theoretical 
predictions  using  standard  parallel  performance  models  indi¬ 
cate  that  higher-dimensional  decompositions  provide  supe¬ 
rior  parallel  program  performance  in  terms  of  scalability. 
The  theory  is  tested  using  a  finite-volume  time-domain 
(FVTD)  Maxwell  equations  solver  to  compute  the  electro¬ 
magnetic  fields  inside  a  rectangular  waveguide  using  various 
grid  sizes  and  processor  numbers  on  three  different  parallel 
architectures-the  Intel  Paragon,  the  IBM  SP2,  and  the  Cray 
T3D.  The  specific  performance  of  the  FVTD  algorithm  on 
the  three  machines  is  investigated,  the  relation  between  pro¬ 
cessor  connection  topology  and  message  passing  perfor¬ 
mance  of  a  typical  grid-based  hyperbolic  equation  solver  are 
identified,  and  the  results  are  used  to  augment  the  classical 
parallel  performance  model.  Although  clear  performance 
trends  emerge  in  terms  of  the  dimensionality  of  the  decom¬ 
position,  results  indicate  that  higher-dimensional  decompo¬ 
sitions  do  not  always  provide  superior  parallel  performance. 

1  Introduction 

Numerical  simulations  of  electromagnetic  or  fluid  flow  phe¬ 
nomena  are  most  often  constrained  in  their  complexity  by 
available  computational  processing  capability.  Although 
complex  three-dimensional  calculations  were  once  the 
exclusive  realm  of  the  vector  supercomputer,  through  recent 
dramatic  advances  in  computer  hardware  technology- 
including  the  development  of  powerful  Reduced  Instruction 
Set  Computing  (RISC)  microprocessors,  high-density 
Dynamic  Random  Accesses  Memory  (DRAM),  and  very- 
high-speed  switching  networks-designers  have  been  able  to 
construct  powerful  machines  comprised  of  hundreds  to  thou¬ 
sands  of  processors  which  are  often  capable  of  performance 
exceeding  that  of  traditional  vector  supercomputers.  Unfor¬ 
tunately,  efficient  utilization  of  these  parallel  machines 
requires  much  more  effort  on  the  part  of  the  algorithm 
designer  since  compilers  are  not  yet  able  to  fully  extract  the 
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parallelism  inherent  in  a  computer  code  and  properly  map 
that  parallelism  to  a  distributed-memory  environment. 

One  method  of  potentially  achieving  a  high  degree  of 
concurrency  in  typical  grid-based  scientific  computations 
such  as  those  found  in  time-domain  computational  electro¬ 
magnetics  (CEM)  or  computational  fluid  dynamics  (CFD) 
algorithms  is  to  divide  the  computational  domain  into  a 
series  of  subdomains  or  blocks  and  then  assign  the  blocks  in 
some  maimer  to  the  available  processors.  This  approach  has 
been  appUed  with  an  emphasis  toward  engineering  practical¬ 
ity  by  several  researchers  in  both  the  CFD  and  CEM 
communities'"®.  Furthermore,  the  issue  has  been  examined 
ft’om  a  parallel  efficiency  standpoint  by  Wong,  et  al.^ 

Since  the  practice  of  domain  decomposition  has  become 
increasingly  prevalent,  it  is  important  to  investigate  methods 
for  partitioning  and  assigning  the  computational  domain  to 
the  available  processors.  Because  the  problem  of  determin¬ 
ing  an  optimal  mapping  is  known  to  be  in  problem  space  NP 
Complete^,  no  such  attempt  is  made  here.  Instead  three 
domain  decomposition  approaches-one,  two,  and  three 
dimensional  (as  depicted  in  Figure  l)-are  examined  in  terms 
of  theoretical  and  measured  parallel  scalability.  Furthermore, 
since  modem  parallel  architectures  differ  widely  in  their  pro¬ 
cessor  capabilities,  interprocessor  connection  topologies  and 
communication  bandwidths,  it  is  reasonable  to  expect  that  a 
given  domain  mapping  caimot  yield  identical  parallel  seal- 
ability  across  all  platforms.  For  this  reason,  the  domain 
decomposition  analysis  is  performed  on  three  separate  paral¬ 
lel  platforms  which  are  discussed  in  the  following  section. 

2  Target  Architectures 

The  machines  chosen  for  use  in  this  study-the  Intel  Paragon, 
the  IBM  SP2  and  the  Cray  T3D-arguably  represent  the  most 
widely  available  currently  produced  distributed-memory 
computing  platforms.  Each  of  these  machines  is  similar  in 
that  each  utilizes  RISC  processors  as  its  central  processing 
units.  However,  the  interprocessor  connection  topology  of 
the  machines  differs  substantially.  The  Paragon  and  T3D 
both  employ  very-high-performance  static  interconnection 
networks  configured  as  a  two-dimensional  mesh  and  a  three- 
dimensional  torus,  respectively.  The  SP2,  on  the  other  hand, 
utilizes  an  omega  network  of  substantially  lower  bandwidth. 
In  addition  to  the  differences  in  connection  topology,  the 
T3D  provides  several  programming  options  not  available  on 
the  other  machines  due  to  its  memory  structure.  Although 
the  SP2  and  Paragon  are  true  distributed-memory  platforms. 
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Figure  1:  Domain  Decomposition  Techniques 


Figure  2:  Rectangular  Waveguide  Geometry 


the  T3D  is  classified  as  a  distributed-shared-memory 
machine  since,  although  memory  is  physically  distributed,  a 
single  global  address  space  is  available.  This  allows  any  pro¬ 
cessor  to  directly  address  memory  contained  on  another  pro¬ 
cessor.  It  is  also  possible  to  explicitly  deliver  data  between 
processors  through  the  use  of  a  message-passing  library  such 
as  that  encapsulated  in  Parallel  Virtual  Machine  (PVM)^.  In 
this  mode,  the  programming  environment  of  the  T3D  is  sim¬ 
ilar  to  that  of  the  Paragon  and  SP2. 

Although  several  message-passing  libraries  are  avail¬ 
able  for  each  machine,  the  libraries  used  in  this  study  were 
chosen  based  on  vendor  support,  portability,  and  perfor¬ 
mance.  At  the  inception  of  this  study,  PVM  was  actively  sup¬ 
ported  by  Cray  Research  for  the  T3D.  Similarly,  support  and 
documentation  for  the  Intel  message  passing  library,  NX*®, 
was  readily  available.  Finally,  Message  Passing  Interface** 
(MPI)  was  selected  for  the  SP2  due  to  its  portability  and  per¬ 
formance.  By  abstracting  the  library-specific  programming 
calls  through  the  use  of  the  macro  feature  in  the  C  program¬ 
ming  language,  a  single  computer  code  was  used  to  collect 
performance  data  on  all  three  machines.  A  summary  of  the 
salient  features  of  each  of  the  machines  appears  in  Table  1 . 

3  Problem  Description 


excited  in  a  TMZll  mode.  This  problem  has  been  exten¬ 
sively  studied  by  Shang"*’*^,  and  the  existence  of  an  analyti¬ 
cal  solution*^  allows  for  a  ready  verification  of  parallel 
program  correctness. 

3.2  Numerical  Procedure 


In  order  to  compute  the  electromagnetic  fields  inside  the 
rectangular  waveguide,  the  differential  vector  forms  of  the 
two  Maxwell  curl  equations  are  solved  using  a  collocated, 
cell-centered,  explicit  FVTD  scheme.  For  a  general  curvilin¬ 
ear  (^,  T),  Q  coordinate  system,  the  equations  can  be  written 
as 


^  ^  ^  ^ 
at  aq  ac 


(1) 


where  Q  contains  the  six  unknown  scalar  electromagnetic 
field  components,  J  contains  the  scalar  components  of  the 
surface  current,  and  E ,  F ,  and  G  are  known  as  the  flux  vec¬ 
tors.  Application  of  equation  (1)  to  a  general  hexahedral 
finite  volume  cell  yields 


=  0  (2) 

k=  1 


3.1  Model  Problem 

The  problem  selected  for  examining  the  various  domain 
decomposition  approaches  was  the  computation  of  the  elec¬ 
tromagnetic  fields  inside  a  rectangular  waveguide  as 
depicted  in  Figure  2.  The  computational  domain  for  the 
problem  was  uniform  and  Cartesian,  thereby  facilitating  a 
relatively  straightforward  partitioning  of  the  domain.  The 
physical  dimensions  a,  b,  and  L  of  the  waveguide  were 
scaled  to  re ,  ti  ,  and  1  respectively,  and  the  waveguide  was 


where  R  =  £|-i-Fti  +  GC.,  n*.  and  are  the  unit  surface 
normal  and  surface  area  of  cell  face  k,  respectively,  and  V  is 
the  cell  volume. 

In  order  to  advance  the  solution  in  time,  the  FVTD 
scheme  requires  the  evaluation  of  the  flux  vectors  at  the  cell 
faces.  To  do  so,  the  present  study  uses  a  flux-vector-splitting 
approach  after  Steger  and  Warming*®  which  splits  the  flux  at 
a  cell  face  into  positive  and  negative  components  according 
to  the  signs  of  the  eigenvalues  of  the  flux  Jacobian  matrix  A, 
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Cray  T3D 

(Eglin  Air  Force  Base,  FL) 

IBM  SP2 

(Maui  High  Performance 
Computing  Center) 

Intel  Paragon  XP/S 
(Wright  Patterson  Air 
Force  Base,  OH) 

central  processing  unit  (CPU) 

DEC  Alpha  21064  @  150 
megahertz 

IBM  RS/6000  @  66.7 
megahertz 

Intel  i/860XP  @  50 
megahertz 

single  processor  megaflop  rating 
(double  precision) 

150 

266 

75 

number  of  compute  processors 

128 

400 

368 

memory  per  CPU  (megabytes) 

64 

64-1024 

32-64 

interprocessor  connection  topology 

Three-dimensional  torus 
(nodes  configured  8x4x4) 

Omega  network 

Two-dimensional  mesh 

communication  bandwidth 
(megabytes/sec) 

300 

40 

200 

communication  library  used 

Parallel  Virtual  Machine 
(PVM) 

Message  Passing 
Interface  (MPI) 

NX 

maximum  processors  available  to  a 
single  job  (without  special  request) 

128 

128 

128 

Table  1:  Target  Machine  Characteristics 


tion. 

3.3  Parallel  Implementatioja 

The  MUSCL  scheme  described  in  the  previous  section 
forms  the  crux  of  the  data  dependencies  of  the  numeri¬ 
cal  scheme.  Referring  to  Figure  3,  in  order  to  compute 
the  total  flux  at  cell  face  i  -i- 1/2  ,  F,  +  1/2 ,  the  positive 
flux  component  requires  dependent  variable  information 
from  cells  i  -  1 ,  i,  and  i  -1- 1 .  Similarly,  the  negative 
flux  component  requires  information  from  cells  i,  /  -1- 1 , 
and  i  +  2.  These  data  dependencies  are  known  as  the 
computational  stencil  of  the  scheme  and  are  depicted  in 
the  top  portion  of  the  figure.  In  a  parallel  environment, 
one  or  more  of  these  cells  may  reside  on  different  pro¬ 
cessors,  and  thus  a  means  must  exist  to  transfer  neces¬ 
sary  information  between  processors.  The  transfer  of 
information  is  facilitated  through  the  use  of  buffer  stor¬ 
age  locations.  These  locations  serve  to  hold  dependent- 
variable  or  other  information  that  is  computed  on  one 
processor  but  necessary  for  other  computations  on 
another.  A  sample  of  the  buffer  locations  is  shown  as  the 
shaded  cells  in  the  lower  portion  of  Figure  3.  Using  this 
approach  for  the  simple  two-processor,  one-dimensional 
example  shown  in  the  figure,  the  flux  calculation  for  a 
cell  which  falls  on  a  boundary  created  by  the  domain 
decomposition  begins  as  processors  1  and  2  indepen¬ 
dently  compute  the  positive  and  negative  flux  compo¬ 
nent,  respectively,  for  cell  face  i+  1/2.  This 
calculation  requires  data  stored  in  the  buffer  locations 
on  each  processor.  Processor  1  passes  the  positive  flux 
component  to  processor  2  which  then  computes  the  total 


where  for  the  q  direction 


A  - 


dQ 


and  for  the  i+I/2  face  of  cell  i  (see  Figure  3), 
^1+1/2  =  ^  (Gi+ 1/2)  +  ^  (Gi+ 1/2) 


(3) 


(4) 


Similar  expressions  hold  for  the  fluxes  at  the  five 
remaining  cell  faces.  Note  that  the  positive  and  negative 
fluxes  in  equation  (4)  are  functions  of  the  dependent 
variables  at  the  left  and  right  states  (denoted  by  the 
superscripts  L  and  J?  in  the  equation)  of  cell  face 
i+1/2;  however,  the  dependent  variables  are  not 
defined  at  the  cell  faces  but  rather  at  the  cell  centers.  It 
thus  becomes  necessary  to  transfer  information  from 
cell  centers  to  cell  faces.  This  can  be  accomplished  in 
one  of  two  ways:  the  fluxes  can  be  evaluated  at  cell  cen¬ 
ters  and  then  extrapolated  to  cell  faces,  or  the  dependent 
variables  can  be  extrapolated  to  the  cell  faces  and  the 
fluxes  subsequently  evaluated.  The  latter  method  is 
known  as  Monotone  Upstream-centered  Schemes  for 
Conservation  Laws  (MUSCL)  and  is  the  methodology 
used  in  this  work.  Using  the  MUSCL  approach  after  van 
Leer^^,  the  extrapolation  yields  a  scheme  that  is  spa¬ 
tially  third-order  accurate. 

With  the  flux  evaluation  complete,  equation  (2)  is 
solved  over  each  cell  by  applying  a  two-stage,  second- 
order-accurate  Runge  Kutta  procedure  to  the  temporal 
derivative  term.  This  yields  a  fully  explicit  numerical 
procedure  that  is  ideally  suited  to  a  parallel  implementa¬ 
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(5) 


i+1/2 


Processor 

1 


2.  F'^'data  passed 
to  processor  2 


Processor  '  ^ 
2 


1  a.  F'^computed  on  processor  1 


3.  Total  flux  computed 
on  processor  2  and 
passed  to  processor  1 


1  b.  F  computed  on  processor  2 
Figure  3:  Flux  Message  Detail 


T  =  T  +  T 
par  calc  overhead 

The  overhead  consists  of  several  factors  including  com¬ 
munication  and  any  extra  calculations  necessary  to  imple¬ 
ment  the  code  on  a  parallel  machine.  For  this  analysis,  the 
parallel  overhead  is  assumed  to  be  dominated  by  the  commu¬ 
nication  time,  classical  cut-through-routing 

communication  cost  model  of  Kumar,  et  al.^^  is  used,  then 
the  communication  time  required  for  a  single  message  to 
travel  between  processors  a  and  b,  ,  is  given  by 


t  =  t  +  mt  +  It, 
comm  s  w  h 


where  is  the  message  start-up  time,  is  the  per-word 
transfer  time,  is  the  latency  associated  with  a  hop  between 
two  processors,  m  represents  the  number  of  words  trans¬ 
ferred,  and  I  represents  the  number  of  hops  the  message  must 
make  in  order  to  travel  from  processor  a  to  processor  b.  In 
most  modem  parallel  architectures,  the  per-hop  time  is 
extremely  small  and  all  processors  can  be  considered  com¬ 
putationally  close.  This  allows  the  communication  cost 
model  to  be  simplified®,  viz. 


^comm 


flux  for  the  cell  face  and  returns  this  information  to  proces¬ 
sor  1.  Assuming  the  fluxes  at  the  remaining  cell  faces  have 
been  calculated,  the  processors  subsequently  update  cells  i 
and  f  +  1 ,  respectively,  to  the  new  time  level.  For  a  higher¬ 
dimensional  problem,  the  message-passing  scenario  is 
repeated  for  each  of  the  decomposition  directions.  No  mes¬ 
sages  are  required  along  a  coordinate  direction  in  which  the 
computational  domain  is  not  decomposed.  Once  all  cells 
have  been  updated  to  the  new  time  level,  dependent-variable 
information  is  exchanged  so  the  buffer  storage  locations  con¬ 
tain  new  time  level  data.  Since  the  waveguide  was  consid¬ 
ered  filled  with  a  homogeneous  material  (free  space),  the 
wave  speed  was  uniform  throughout  the  computational 
domain.  This  allows  a  single  global  time  step  determination 
at  the  beginning  of  program  execution,  and  thus  no  global 
communications  are  required  by  the  algorithm  after  the  ini¬ 
tial  time  step  calculation. 

It  is  possible  to  double  the  size  of  the  buffers  and 
thereby  preclude  the  requirement  for  either  processor  to  send 
any  flux  data  to  the  other;  however,  this  approach  was  exam¬ 
ined  and  discarded  since  the  storage  penalty  exacted  was  not 
offset  by  any  significant  performance  gain. 

4  Parallel  Analysis 

4.1  Theoretical  Parallel  Performance 

Given  the  algorithm  data  dependencies  described  in  the  pre¬ 
vious  section,  the  theoretical  parallel  performance  can  be 
determined.  Parallel  mn  time,  can  be  assumed  to  con¬ 
sist  of  contributions  from  calculations,  T^alc’  from  paral¬ 
lel  overhead,  Tg^erhead’  i-e- 


For  this  analysis,  the  computational  domain  is  assumed 
to  be  three  dimensional  and  to  consist  of  n  cells  with 
n}^^  =  R  cells  distributed  along  each  of  the  three  coordinate 
directions.  Furthermore,  the  domain  is  assumed  to  be  evenly 
divided  among  the  number  of  available  processors,  p.  Thus, 
the  number  of  grid  cells  residing  on  each  processor,  tip,  is 
given  by 

rip  =  R^/p  (8) 

With  the  exception  of  processors  that  contain  the  grid 
cells  on  the  boundaries  of  the  computational  domain,  each 
processor  must  send  one  message  to  each  neighboring  pro¬ 
cessor  along  each  of  the  decomposition  directions  during  the 
numerical  flux  computations.  Furthermore,  once  the  flux 
computations  are  complete,  the  updated  dependent-variable 
information  stored  in  the  buffer  arrays  must  be  exchanged 
between  neighboring  processors.  Since  the  time  integration 
portion  of  the  algorithm  is  a  two-step  procedure,  this  sce¬ 
nario  is  repeated  twice  in  order  to  advance  the  solution  in 
time.  Consequently,  the  number  of  messages  sent  and 
received  per  time  step  is  16f,  and  the  length  of  each  message, 
m,  is 


ni.2  (1-0/i 

m  =  DR  p 


where  D  represents  the  number  of  storage  bytes  required  per 
grid  cell  for  the  dependent-variable  data  and  i  represents  the 
dimensionality  of  the  decomposition,  0  <  /  <  3 .  Note  that  for 
i  =  0 ,  the  computational  domain  is  not  decomposed,  and 
thus  i  =  0  =  1 .  In  all  cases,  the  term  is 

restricted  to  integral  values.  Since  the  total  communication 
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time  is  the  product  of  the  single  message  communication 
time  and  the  total  number  of  messages,  then 

+  (10) 

which  leads  to  the  theoretical  parallel  run  times  for  the  vari¬ 
ous  decompositions,  namely 

Tpar  =  V '  “  (1 1) 

where  is  the  single-processor  computation  time  per  grid 
cell  per  time  step. 

With  parallel  run  time  determined,  theoretical  absolute 
speedup,  S^,  and  absolute  efficiency,  £^-two  of  the  most 
common  metrics  for  parallel  performance-can  be  deter¬ 
mined  using  the  standard  definitions 

5.  =  T,/T^ar  (12) 

Ea  =  5/p  (13) 


algorithm  on  the  machine  of  implementation. 

Since  parallel  machines  provide  a  means  of  solving  rela¬ 
tively  large  problems  over  a  fairly  large  number  of  proces¬ 
sors,  it  is  desirable  to  predict  how  an  algorithm  will  perform 
as  both  the  problem  size  and  the  number  of  processors  is 
increased.  Isoefficiency  measures  the  scalability  of  an  algo¬ 
rithm  in  such  an  instance  and  is  determined  by  setting  the 
computational  work  equal  to  the  parallel  overhead 
function^^.  In  the  present  analysis,  the  isoefficiency  is 
derived  directly  from  equation  (16)  to  yield 

R  =  =>  «  =  0(/?^^‘)  (17) 

Thus,  in  order  to  maintain  a  constant  parallel  efficiency, 
a  computational  domain  that  is  decomposed  three  dimen¬ 
sionally  need  only  be  increased  in  size  by  an  amount  linearly 
proportional  to  increasing  processor  number.  One-  and  two- 
dimensional  decompositions  require  larger  increases  in  the 
size  of  the  computational  domain  as  increasing  numbers  of 
processors  are  applied  to  the  problem  if  the  efficiency  is  to 
be  maintained  at  a  constant  value. 


where  Tj  is  the  run  time  for  the  best  known  serial  algorithm 
to  solve  the  problem  in  question.  It  is  often  not  practical  to 
compare  the  chosen  parallel  algorithm  against  the  best 
known  serial  algorithm,  and  consequently,  relative  speedup, 
S,  and  efficiency,  E,  are  often  used  whereby  the  parallel  algo¬ 
rithm  run  time  is  compared  against  the  run  time  of  the  algo¬ 
rithm  on  a  single  processor.  Substituting  i  =  0 ,  p  =  1  into 
equation  (11)  to  determine  Tj ,  the  theoretical  relative  per¬ 
formance  is  thus  given  by 


^  +  f  [/?-2(f/f,)  +  Dp(i-')/<(ryg] 

E  -  - - - -  (15) 

At  this  point,  the  applicability  of  the  analysis  to  other 
implementations  can  be  extended  by  treating  the  terms  , 
t^,  and  16f  (the  number  of  message  passes)  as  constants. 
An  asymptotic  analysis  is  then  performed  in  which  p  and  R 
are  assumed  large  to  yield'® 


4.2  Machine  Performance  Characterization 

Determination  of  theoretical  performance  as  embodied  in 
equations  (14)  and  (15)  requires  that  the  values  of  and 
be  ascertained.  In  general,  and  are  machine  spe¬ 
cific,  while  t  depends  both  on  the  target  architecture  as  well 
as  the  algorifiim.  In  order  to  determine  ,  the  FVTD  algo¬ 
rithm  was  run  for  100  time  steps  on  a  smgle  processor  on 
each  of  the  three  machines  using  a  variety  of  grid  sizes  up  to 
the  maximum  size  containable  in  the  machine’s  core  mem¬ 
ory.  The  times  for  these  execution  runs  are  contained  in  Fig¬ 
ure  4.  Although  not  presented  here,  a  formal  analysis  of  the 
FVTD  algorithm  run-time  complexity  reveals  that  the 
method  is  in  time  space  0(n)  where  n  is  the  number  of  cells 
in  the  computational  domain.  This  analysis  is  corroborated 
by  the  data  exhibited  in  the  figure  which  show  the  ran  times 
to  be  a  linearly  increasing  function  of  n.  Any  slight  devia¬ 
tions  from  linearity  can  be  explained  by  noting  that  boundary 
condition  cells  require  less  computational  work,  and  as  the 
grid  size  decreases,  the  boundary  condition  points  have  an 
increased  affect  on  the  algorithm  run  time.  Using  a  simple 
least-squares  fit  of  the  data,  was  determined  from  the 
slope  of  each  plot. 


1 

E  =  - ry. - 

l+&ip  /R) 

where  0  indicates  a  tight  upper  bound. 


(16) 


It  is  important  to  note  that  the  asymptotic  analysis 
encapsulated  in  (16)  is  taken  for  p  and  R  appropriately  large 
when  in  fact,  for  a  practical  implementation,  it  may  not  be 
possible  to  let  either  variable  grow  suitably  large  due  to 
memory  or  machine  architecture  restrictions.  Consequently, 
a  consideration  of  the  constants  appearing  in  (14)  and  (15) 
may  be  necessary  when  assessing  true  performance  of  the 


The  message-passing  performance  of  each  of  the  three 
machines  was  determined  by  configuring  eight  processors  as 
a  logical  ring  and  circulating  messages  of  varying  size  1000 
times  around  the  ring.  Run  time  data  for  this  experiment 
appears  in  Figure  5.  By  performing  a  linear  curve  fit  of  the 
data,  the  message  start-up  time  and  per-word  transfer  time 
can  be  computed  directly  from  the  y-intercept  and  line  slope. 
Although  the  linear  curve-fitting  process  generated  excellent 
results  (goodness  of  fit  >  0.99  in  all  cases),  the  SP2  is  char¬ 
acterized  by  marked  and  somewhat  random  variations  in 
execution  time.  This  variation  is  most  likely  due  to  message 
contention  over  the  omega  switching  network.  Table  2  con- 
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Figure  4:  Single  Node  Algorithm  Run  Times 


tains  a  summaiy  of  the  findings  of  this  portion  of  the  study. 
The  values  agree  in  general  with  those  of  Foster®.  Any  dis¬ 
crepancies  can  be  explained  by  differences  in  operating  sys¬ 
tems  and  message-passing  libraries  used. 


5  Domain  DecQmpo.sitiQ.a.Stu.dy 
5.1  Test  Procedure 

As  is  evident  in  the  theoretical  derivations,  parallel  speedup 
and  efficiency  are  dependent  on  a  variety  of  parameters 
including  the  problem  size  and  the  number  of  processors  on 
which  the  algorithm  is  executed.  In  order  to  test  these  param¬ 
eters,  the  FVTD  algorithm  was  mn  for  100  iterations  on  grid 
sizes  ranging  from  R  =  32  to  R  =  128  using  up  to  128 
processors.  One-hundred  iterations  was  deemed  an  accept¬ 
able  number  so  that  run  times  would  not  be  excessive  yet  any 
transient  parallel-environment  effects  such  as  interprocess- 
message  contentions  would  be  suitably  minimized.  In  every 
decomposition,  the  number  of  finite-volume  cells  residing 
on  each  processor  was  identical.  This  reduced,  but  did  not 
completely  eliminate,  load  imbalance  since  boundary  condi¬ 
tion  cells  require  less  computational  effort  than  interior  cells. 
The  choice  of  domain  decompositions  and  grid  sizes  was 
constrained  primarily  by  the  amount  of  memory  available  on 
each  processor.  This  was  especially  trae  in  the  case  of  the 
Paragon  which,  with  the  exception  of  16  “MP”  nodes,  pos¬ 
sesses  only  32  megabytes  per  processor.  In  addition  to  the 
memory  constraint,  additional  restrictions  were  imposed  by 
the  processor-allocation  scheme  of  the  T3D.  The  current  ver¬ 
sion  of  the  operating  system  on  this  machine  allows  proces¬ 
sors  to  be  allocated  only  in  powers  of  two.  This  required  that 
the  computational  domain  be  partitioned  in  powers  of  two 
along  each  decomposition  direction. 


Figure  5:  Ring  Message  Passing  Times 
(1000  rings  on  8  processors) 


In  addition  to  the  dimensionality  of  the  decomposition, 
the  directional  dependence  of  the  decompositions  was 
assessed  by  permuting  the  decompositions  for  each  coordi¬ 
nate  direction.  The  permutations  correspond  to  a  re-orienta- 
tion  of  the  planes  or  lines  of  grid  points  as  depicted  in  Figure 
1.  Examination  of  the  decomposition  along  each  direction 
allows  for  an  assessment  of  machine  memory-accessing  per¬ 
formance  and  uncovers  potential  bus  contentions  which  are 
the  inevitable  result  of  mapping  a  higher-dimensional  physi¬ 
cal  problem  onto  a  lower-dimensional  interconnection  net¬ 
work.  Table  3  contains  a  summary  of  the  decompositions 
examined.  Rather  than  list  each  grid  size,  processor  count, 
and  decomposition  separately,  they  are  combined  into  a  sin¬ 
gle  listing  whenever  possible  with  the  understanding  that  all 
possible  combinations  on  a  given  table  row  were  examined. 

The  FVTD  code  used  to  conduct  the  study  was  written 
in  C  to  take  advantage  of  that  language’s  dynamic  memory 
allocation  routines.  Memory  for  each  grid  decomposition 
was  allocated  at  run  time.  This  facilitated  the  examination  of 


(psec) 

(psec) 

(psec) 

T3D 

102 

25 

.030 

Paragon 

546 

36 

.018 

SP2 

132 

76 

.040 

Table  2:  Machine  and  FVTD  Algorithm  Performance 
Parameters 
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Dimension 

of 

Partition 

Grid 

Size 

(.R) 

Processors 

(P) 

Decomposition* 

ID 

32, 64, 
128 

4,  8, 16, 

32, 64,  128 

Px  1  X  1 
such  that  P<R 

2D 

32,  64, 

96 

4, 16,  64 

Jp^Jpy-X 

2D 

32,  64, 

96 

8,  32,  128 

3D 

32,  64, 

96 

8, 64 

l/pxl/pxl/p 

3D 

32,  64, 

96 

16, 128 

3D 

32, 64, 

96 

32 

2x4x4 

*A11  permutations  of  the  tabulated  decompositions  were  per¬ 
formed.  For  example,  in  addition  to  the  stated  P  x  1  x  1  one¬ 
dimensional  partitioning,  1  x  P  x  1  and  1  x  1  x  P  partitions 
were  also  examined.  The  decomposition  nomenclature  refers 
to  the  number  of  blocks  by  which  the  computational  domain 
was  partitioned  in  each  coordinate  direction. 

Table  3:  Summary  of  Examined  Decompositions 


a  large  number  of  decompositions  during  a  single  program 
run  with  no  memory  wasted  due  to  static  array  dimension¬ 
ing.  All  message  passing  was  performed  using  non-blocking 
communication  primitives  with  special  care  taken  to  inter¬ 
leave  communication  and  computation  wherever  possible. 

As  timing  data  was  collected,  several  runs  were  re- 
accomplished  to  assess  the  repeatability  of  the  timing  data. 
In  the  case  of  the  T3D  and  the  Paragon,  results  were  found  to 
be  repeatable  to  well  within  one  percent.  However,  such  was 
not  the  case  for  the  SP2  where  timing  data  varied  much  more 
dramatically  (most  likely  due  to  the  effects  observed  in  Fig¬ 
ure  5).  In  order  to  ensure  accurate  results,  all  runs  on  the  SP2 
were  accomplished  at  least  five  times  and  the  minimum  time 
observed  was  used  as  the  execution  time.  This  procedure  is 
similar  to  that  recommended  by  Thinking  Machines  in  the 
timing  studies  conducted  by  Blosch'. 

5.2  Results 

5.2.1  Electromagnetic  Field  Computations 

Figure  6  contains  a  comparison  of  the  magnitude  of  the  com¬ 
puted  and  exact  magnetic  fields  inside  the  waveguide.  The 
computed  solution  was  obtained  from  a  32-node  Paragon  run 
using  approximately  110,000  grid  points  with  the  computa¬ 


Figure  6:  Total  Magnetic  Field  Contours 
(a)  exact,  (b)  computed 

tional  domain  partitioned  three-dimensionally  in  a  4x2x4 
configuration.  In  the  figure,  the  y  and  z  axes  have  been 
scaled  so  that  the  cutting  planes  located  at  y  =  0.6,  1.6,  and 
2.5  are  unobstructed.  The  computed  and  exact  solutions  are 
in  excellent  agreement;  in  fact,  the  plots  are  indistinguish¬ 
able.  While  not  a  formal  proof  of  correctness,  it  does  indicate 
that  the  parallel  algorithm  functioned  as  intended. 

5.2.2  Parallel  Scalability 

The  results  of  the  domain-decomposition  studies  appear  in 
Figures  7-12.  Figures  7-9  contain  parallel  speedup  results  for 
the  one-,  two-,  and  three-dimensional  decompositions, 
respectively,  while  Figures  10-12  contain  efficiency  results. 
A  comparison  of  the  relative  performance  of  the  FVTD  algo¬ 
rithm  for  a  given  dimensionality  of  decomposition  on  each 
of  the  three  machines  can  be  conducted  by  examining  sub¬ 
figures  a,  b,  and  c  of  a  single  figure.  On  the  other  hand,  a 
comparison  of  the  sub-figures  in  a  given  column  facilitates 
an  examination  of  the  effects  of  the  dimensionality  of  the 
decomposition  for  a  given  platform.  It  is  apparent  from  the 
figures  that  in  nearly  every  case,  the  two-  and  three-dimen¬ 
sional  decompositions  produced  performance  superior  to 
that  of  the  one-dimensional  decompositions.  FurAermore, 
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three-dimensional  decompositions  exhibited  slightly  better 
performance  in  several  instances  on  the  T3D  while  two- 
dimensional  decompositions  were  slightly  superior  in  gen¬ 
eral  on  the  Paragon  and  noticeably  superior  on  the  SP2.  This 
is  especially  evident  in  an  examination  of  the  efficiency 
curves  of  Figures  10-12.  These  trends  in  parallel  perfor¬ 
mance  correspond  quite  closely  to  the  topologies  of  the  three 
machines.  The  three-dimensional  torus  structure  of  the  T3D 
allows  each  processor  six  neighboring  processors  while  the 
two-dimensional  mesh  structure  of  the  Paragon  translates  to 
at  most  four  neighbors  for  a  given  computational  node.  In 
contrast,  the  omega  network  of  the  SP2  requires  that  any 
communication  between  two  processors  traverse  at  least  one 
switch  hop  and  two  communication  lines.  It  appears  that 
superior  parallel  performance  is  achievable  when  there  is  a 
close  agreement  between  the  physical  dimensionality  of  the 
domain  decomposition  and  the  physical  topology  of  the 
architecture  onto  which  it  is  mapped.  It  should  be  remem¬ 
bered  that  the  speedup  and  efficiency  curves  of  Figures  7-12 
provide  one  measure  of  the  scalability  of  the  algorithm  and 
do  not  reflect  actual  program  execution  times.  For  example, 
although  the  curves  show  the  scalability  of  the  algorithm  on 
the  SP2  to  be  decidedly  less  than  that  on  the  other  two  plat¬ 
forms,  the  superior  performance  of  the  RS/6000  when  com¬ 
pared  to  the  i/860XP  yielded  substantially  faster  mn  times. 
In  other  cross-platform  comparisons,  the  T3D  was  able  to 
significantly  outperform  the  other  two  machines  both  in 
terms  of  parallel  scalability  and  in  terms  of  absolute  execu¬ 
tion  speed.  This  is  despite  the  comparatively  poor  perfor¬ 
mance  of  the  message  passing  (as  shown  in  Figure  5)  which 
is  most  likely  due  to  the  use  of  PVM. 

5.2.3  Comparison  with  Theoretical  Model 

A  comparison  of  the  theoretical  and  measured  parallel 
speedups  for  a  two-dimensional  decomposition  (R  =64) 
appears  in  Figure  13.  The  theoretical  curve  was  generated  by 
substimting  the  measured  values  contained  in  Table  2  into 
equation  (14).  The  measured  speedup  is  somewhat  over  pre¬ 
dicted,  but  this  is  not  surprising  since  the  theoretical  model 
neglects  such  issues  as  load  imbalance  and  message  conten¬ 
tion.  It  is  therefore  expected  that  this  model  provides  a  best- 
case  performance  prediction.  The  behavior  of  the  model  with 
respect  to  variations  in  the  ratio  t^/t^  -  X  is  also  shown  in 
the  figure.  Using  the  value  for  and  found  in  Table  2 
yields  the  ratio  %  =  0.0003  .  A  value  of  x  =  0.0009  is  also 
shown  for  reference  purposes.  It  is  not  unreasonable  to 
expect  fairly  large  variations  in  %  for  different  decomposi¬ 
tions  due  to  differences  in  memory  accessing  patterns.  In 
fact,  these  variations  are  demonstrated  in  section  5.2.4. 

Although  the  theoretical  model  was  able  to  predict  the 
performance  of  certain  decompositions  on  the  T3D  to  an 
acceptable  manner,  such  was  not  the  case  for  the  Paragon 
and  the  SP2.  In  these  cases,  the  model  drastically  over  pre¬ 
dicted  parallel  performance.  Since  it  is  known  that  %  plays  a 
primary  role  in  parallel  performance,  a  more  realistic  test 
problem  was  conceived  to  measure  .  Instead  of  utilizing  a 
ring  stmcture  in  which  a  single  message  is  in  transit  at  any 


given  instant,  processors  were  configured  as  a  logical  three- 
dimensional  mesh  of  dimension  2x2x2  and  4x4x4. 
Nearest  neighbors  in  the  mesh  simultaneously  exchanged 
messages  of  varying  lengths  along  each  coordinate  direction 
and  the  times  for  1000  exchanges  in  each  direction  were 
recorded.  The  results  of  this  experiment  are  contained  in  Fig¬ 
ure  14.  In  the  absence  of  message  contention,  the  message¬ 
passing  times  are  expected  to  be  identical  regardless  of  pro¬ 
cessor  number  or  direction  of  message  exchange  since  no 
two  messages  simultaneously  transit  the  same  logical  con¬ 
nection  between  processors.  In  reality,  however,  the  mapping 
of  the  logical  three-dimensional  structure  to  a  lower-dimen¬ 
sionality  architecture  results  in  the  mapping  of  more  than 
one  logical  connection  to  the  same  physical  connection. 
Although  all  three  machines  exhibit  some  degree  of  variation 
in  message-passing  times  as  the  number  of  processors  is 
increased,  the  effect  is  much  less  dramatic  on  the  T3D.  The 
deviation  in  the  general  trend  is  also  small  for  the  SP2,  but 
the  variations  exhibited  in  Figure  5  become  much  more  pro¬ 
nounced  as  the  number  of  processors  is  increased.  In  addi¬ 
tion  to  the  variation  with  processor  number,  the  Paragon  also 
exhibits  a  directional  bias  in  message  transfer  times  which 
becomes  more  pronounced  as  the  number  of  processors 
increases.  In  the  figure,  the  i,  j,  and  k  notation  simply  identi¬ 
fies  a  direction  along  which  the  message  exchanges 
occurred.  The  magnimde  of  the  variations  in  message-pass¬ 
ing  times  relates  directly  to  the  quality  of  the  theoretical  par¬ 
allel  performance  model,  and  large  variations  in  message¬ 
passing  times  are  expected  (and  observed)  to  adversely 
impact  the  theoretical  predictions. 

5.2.4  Variations  in  Decomposition  Times 

The  results  contained  in  Figures  7-12  represent  the  best 
observed  times  for  a  given  dimensionality  of  decomposition, 
processor  number,  and  grid  size.  As  noted  in  Table  3,  all  per¬ 
mutations  of  a  given  decomposition  were  performed  for  each 
case.  Although  each  permutation  yielded  the  same  number 
of  grid  cells  on  a  given  processor  and  the  same  amount  of 
message  traffic,  certain  decompositions  were  observed  to 
yield  substantially  better  performance  than  others.  The  per¬ 
formance  differences  were  quantified  by  constracting  the 
ratio 

C  =  T  /T  ■ 

^  max  min 

where  T  ■  and  T  represent  the  minimum  and  maxi- 
mum  observed  ran  times  for  each  combination  of  processor 
number,  grid  size,  and  decomposition  dimensionality.  Figure 
15  depicts  this  ratio  for  the  two-dimensional  decompositions 
on  the  SP2.  Although  the  trends  differed  depending  on  the 
machine  and  decomposition  approach,  variations  of  similar 
magnitude  were  observed  for  the  Paragon  and  T3D.  The 
sometimes  marked  variations  indicate  that  memory  access¬ 
ing  issues  are  as  important  as  the  dimensionality  of  the 
decomposition  in  achieving  good  performance.  This  is  to  be 
expected  since  the  RISC  processors  employed  in  all 
machines  require  a  high  degree  of  data  locality  in  order  to 
achieve  near-advertised  megaflop  ratings.  Should  the 
domains  be  decomposed  in  a  manner  that  precludes  locality- 
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Speedup  ^  Speedup 


Figure  7:  One-Dimensional  Decomposition  Speedups  a)  T3D,  b)  Paragon,  c)  SP2 


Figure  8:  Two-Dimensional  Decomposition  Speedups  a)  T3D,  b)  Paragon,  c)  SP2 


Figure  9:  Three-Dimensional  Decomposition  Speedups  a)  T3D,  b)  Paragon,  c)  SP2 


Figure  11:  Two-Dimensional  Decomposition  Efficiencies  a)  T3D,  b)  Paragon,  c)  SP2 


Processors  Processors  Processors 

(a)  (b)  (c) 

Figure  12:  Three-Dimensional  Decomposition  Efficiencies  a)  T3D,  b)  Paragon,  c)  SP2 


Figure  13:  Theoretical  and  Measured  Performance  of 
theT3D 

(R  =  64 , 2D  decomposition) 
of-reference,  then  performance  suffers  dramatically. 

6  Conclusions 

The  relation  between  the  dimensionality  of  the  domain 
decomposition  and  parallel  performance  has  been  assessed 
on  three  modem  distributed  memory  parallel  computing 
platforms  using  a  FVTD  algorithm  and  a  rectangular 
waveguide  geometry.  Higher  dimensionality  (two-  and  three- 
dimensional)  decompositions  were  found  to  nearly  always 
outperform  one-dimensional  decompositions.  The  perfor¬ 
mance  of  the  decompositions  was  found  to  relate  very 
closely  to  the  topology  of  the  machine  on  which  the  algo¬ 
rithm  was  implemented.  In  general,  machines  with  higher 
processor  cormectivity  favored  higher-dimensional  decom¬ 
positions. 

Despite  the  fact  the  classical  parallel  performance  model 
used  in  this  study  accurately  predicted  performance  trends,  it 
occasionally  dramatically  over  predicted  the  actual  perfor¬ 
mance  of  the  algorithm.  This  is  due  to  the  fact  that  the  model 
does  not  account  for  issues  such  as  message  contention 
which  occurs  when  more  than  one  logical  message  path  is 
mapped  to  the  same  physical  connection.  The  adverse  affect 
of  message  contention  was  observed  to  increase  with 
increasing  processor  count  on  all  architectures. 

Although  caution  must  be  exercised  when  attempting  to 
extend  program  performance  characteristics  to  other  algo¬ 
rithms,  because  many  algorithms  designed  for  solving  hyper¬ 
bolic  systems  of  partial  differential  equations  possess  similar 
data  dependencies,  the  results  presented  here  can  be  general¬ 
ized  at  least  to  some  degree  to  a  large  number  of  explicit 
hyperbolic  (and  perhaps  even  some  parabolic  and  elliptic) 
partial  differential  equations  solution  schemes. 


(a) 


(b) 


Figure  14:  Nearest-Neighbor  Message  Passing  Per¬ 
formance  a)  T3D,  b)  Paragon,  c)  SP2 
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Figure  15:  Example  of  Maximum  to  Minimum 
Execution  Time  Ratio 
(2D  decompositions  on  SP2) 
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Abstract — The  accuracy  of  surface  patch  and  wire 
grid  moment  method  models  for  the  computation 
of  near  fields  is  investigated.  A  sphere  and  a  flat 
plate  with  plane  wave  illumination  eire  examined. 

It  is  found  that  wire  grids  exhibit  stronger 
near  field  anomalies  than  surface  patches,  which 
have  the  current  more  distributed  over  the  sur¬ 
face.  Nevertheless,  good  results  can  be  obtained 
with  a  wire  grid,  provided  that  a  small  distance 
fi-om  the  wire  grid  sxu-face  is  maintained. 

The  surface  patch  results  are  obtained  using 
the  Junction  code.  Wire  grid  results  are  obtained 
with  both  the  MBC  and  NEC  codes.  Validation  for 
the  sphere  is  by  comparison  with  an  exact  solu¬ 
tion,  and  validation  for  the  plate  is  by  compari¬ 
son  with  a  high  frequency  UTD  solution  obtained 
firom  the  NECBSC  code. 


1  Introduction 

The  near  field  close  to  the  surface  of  a  complex  shape 
is  of  great  interest  in  antennas  and  electromagnetic 
compatibility.  For  example,  the  radiation  character¬ 
istics  of  an  aircraft  antenna  are  distorted  by  the  fuse¬ 
lage  on  which  the  antennas  are  mounted.  Another 
example  is  in  the  assessment  of  electromagnetic  haz¬ 
ards  to  persoimel  and  equipment  on  the  deck  of  a 
ship,  in  the  presence  of  strong  RF  and  microwave 
sources. 

The  method  of  moments  is  a  suitable  methodol¬ 
ogy  for  the  calculation  of  fields  scattered  by  bodies 
of  resonant  size  and  smaller.  Numerous  codes  exist, 
and  assessment  of  their  accuracy  for  the  computa¬ 
tion  of  electromagnetic  fields  has  been  a  topic  of  on¬ 


going  research  for  many  years.  Historically,  wire  grid 
models  were  the  first  methodology  which  permitted 
moment  method  modehng  of  scattering  by  complex 
shapes.  Richmond’s  Thin  Wire  code  [1]  was  a  pi¬ 
oneering  efiort  in  this  direction.  His  code  was  later 
extended  significantly  by  Tilston  and  Balmain  as  the 
Multiradius  Bridge  Current  MBC  code  [2].  The  Nu¬ 
merical  Electromagnetic  Code  NEC  was  developed 
by  Burke  et  al.  [3].  The  development  of  surface 
patch  codes  such  as  Patch  by  Rao  et  al.  [4],  Junc¬ 
tion  by  Hwu  et  al.  [5],  the  Electromagnetic  Surface 
Patch  Code  ESP,  by  Newman  [6],  and  others,  have 
further  expanded  the  applicability  of  the  moment 
method. 

For  smooth  bodies  without  sharp  edges,  surface 
patches  can  accurately  model  the  surface  current. 
On  the  other  hand,  wire  grids  can  also  be  useful,  as 
an  edge  current  can  be  more  accurately  handled  by  a 
wire  than  a  patch.  A  patch  cannot  represent  the  cur¬ 
rent  at  the  patch  edge,  so  a  separate  “edge  mode”  is 
required.  Inclusion  of  edge  modes  have  been  shown 
to  enhance  the  accuracy  [7],  though  their  incorpora¬ 
tion  into  a  general  purpose  code  is  not  straightfor¬ 
ward.  Another  reason  for  using  wires  is  that  if  open 
bodies  are  modeled  with  NEC,  we  must  use  a  wire 
grid  model,  as  its  MFIE  based  patch  model  is  only 
appropriate  for  closed  bodies. 

Although  much  work  has  been  done  on  the  val¬ 
idation  of  wire  grids  and  patches  for  far  field  cal¬ 
culations,  investigations  into  the  near  field  are  rel¬ 
atively  scarce.  Ludwig  [8]  examined  a  2-D  TM  po¬ 
larized  wire  grid  model  of  a  cylinder  and  foimd  that 
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though  the  tangential  field  is  not  ax;curate  between 
the  wires,  the  far  field  is  accurate,  provided  that  the 
“same  surface  rule”  is  met,  i.e.  that  the  total  surface 
area  of  the  wires  equals  the  surface  area  of  the  true 
surface  being  modeled.  Later,  Paknys  [9]  extended 
Ludwig’s  work  and  demonstrated  that  the  same  sur¬ 
face  rule  is  also  optimum  for  the  near  field  of  a  2-D 
TM  cylinder.  Other  work  has  examined  the  use  of 
surface  patches  in  near  field  computations.  Yang  et 
al.  [10]  examined  a  2-D  cylinder  and  demonstrated 
the  equivalence  of  pulse  basis  patch  currents  and 
filamentary  wire  currents,  provided  that  the  same 
surface  rule  is  used.  Kashyap  and  Louie  [11]  com¬ 
pared  the  surface  currents  of  plates  made  of  either 
wire  grids  or  patches,  and  found  that  the  edge  wires 
have  to  be  made  thinner  to  obtain  agreement  with 
a  patch  model.  Burton  et  al.  [12]  also  used  a  patch 
model  to  study  near  fields.  They  constructed  a  suf¬ 
ficiently  detailed  model  that  enabled  one  to  examine 
the  leakage  through  gaps  in  a  door  on  a  closed  box. 
Kemptner  [13]  computed  the  near  fields  of  a  metal¬ 
lic  cube  and  an  airplane,  using  a  patch  formulaton. 
His  results  for  a  cube  agreed  well  with  the  measured 
surface  current  and  electric  field. 

This  paper  is  an  investigation  into  the  accuracy 
of  near  zone  tangential  and  normal  electric  fields  for 
3-D  bodies,  as  computed  from  siuface  patch  and  wire 
grid  models.  A  square  plate  and  a  sphere  with  plane 
wave  illumination  axe  used  as  test  cases.  The  accu¬ 
racy  of  the  patch  models  are  compared  to  a  UTD 
solution  for  the  plate,  and  an  exact  solution  for  the 
sphere.  For  the  wire  grid  models,  the  same  surface 
rule  and  the  extent  of  near  field  anomalies  are  inves¬ 
tigated. 

Section  2  examines  the  near  fields  of  the  plate. 
Section  3  examines  the  near  fields  of  the  sphere.  Sec¬ 
tion  4  compares  two  different  moment  method  wire 
codes,  MBC  and  NEC.  Section  5  contains  the  con¬ 
clusions. 

2  Near  Field  of  the  Plate 

The  square  plate  is  1  x  1  m  in  size  and  hes  in  the 
x-y  plane  with  the  origin  at  the  center  of  the  plate. 
A  wire  grid  model  that  has  a  grid  size  p  =  0.1  m  is 
shown  in  Fig.  1.  The  wire  radius  is  a^;  =  0.0145  m,  in 
accordance  with  the  same  surface  rule.  The  surface 


Figure  1:  Wire  grid  model  for  the  1  x  1  m  plate.  An 
X  polarized  plane  wave  of  1  V/m  is  normally  incident 
from  above.  The  field  point  is  in  the  y  =  0  plane. 

patch  model  is  similar,  except  that  each  square  area 
is  divided  into  two  triangular  patches. 

The  incident  field  on  the  plate  is  a  plane  wave 
with  an  amplitude  of  1  V/m,  polarized  in  the  x  di¬ 
rection  and  traveling  in  the  —z  direction,  i.e.  = 
point  is  in  the  y  =  0  plane.  The  near 
field  was  calculated  for  several  cases,  and  the  total 
field  E  =  was  plotted,  using  a  reference 

level  of  0  dB  =  1  V /m.  Unless  otherwise  specified, 
the  frequency  is  300  MHz  so  that  g  =  O.IA. 

A.  Junction  and  UTD  Models  for  the  Plate 

The  scattering  by  a  plate  does  not  have  an  exact  so¬ 
lution,  so  results  from  the  UTD  based  BSC  code  [14] 
were  compared  with  the  Junction  MM  patch  code, 
to  establish  confidence  in  the  patch  model.  Fig.  2 
shows  the  tangential  field  Ex  and  the  normal  field 
Ez  at  zjg  =  ±0.04.^  The  agreement  is  within  1.2  dB 
for  Ex  for  all  values  of  x  along  the  plate.  The  agree¬ 
ment  is  also  within  1.2  dB  for  Ez  near  the  plate,  but 
gets  worse  beyond  the  plate  edges  where  |a;|  >  0.5. 
The  reason  for  this  is  unknown,  but  it  is  speculated 
that  further  improvements  could  be  obtained  by  the 
inclusion  of  multiple  diffraction  effects  in  the  UTD 

^  |£z  I  is  the  same  on  both  sides  of  the  plate  so  2  <  0  is  not 
shown. 
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Figure  2:  Comparison  of  UTD  { — )  and  Junction 
(ooo)  for  the  plate,  with  patch  size  g  =  O.IA.  (a) 
Ex,  zjg  =  ±0.4  (b)  Ez,  zfg  =  0.4.  The  field  point 
is  in  the  y  =  0  plane. 


model,  and  possibly  reduced  patch  size  in  the  patch 
model.  We  chose  Junction  for  the  subsequent  plate 
model  validations. 

B.  Extent  of  the  Near  Field  Anomalies 

The  near  field  of  the  wire  grid  plate  was  calculated 
with  NEC  and  compared  to  Jimction.  A  frequency 
of  300  MHz  and  g  =  O.IA  was  used.  Fig.  3  shows 
the  near  field  at  a  distance  of  z/g  =  ±0.4  and  zjg  = 
±0.8.  At  z/g  =  0.4,  the  anomalies  in  Ex  and  Ez 
are  of  comparable  magnitude.  For  Ex,  the  onset  of 
anomalies  occurs  at  about  zjg  =  0.4  on  the  lit  side, 
and  zfg  =  —0.8  on  the  shadow  side.  An  increased 
grid  size  of  y  =  0.25  m  at  300  MHz  was  also  tried, 
and  anomalies  of  comparable  magnitude  were  ob¬ 
tained  at  zjg  =  0.1  on  the  lit  side  and  zjg  =  —0.2 
on  the  shadow  side.  This  suggests  that  the  onset 
of  anomalies  for  fiat  plate  structures  occurs  when 
zjg  «  0.4  on  the  lit  side,  and  \z\/g  w  0.8  on  the 
shadow  side. 

The  results  in  Fig.  3  show  that  for  a  given  ob¬ 
server  height,  the  wire  grid  results  are  not  as  smooth 
as  those  obtained  in  Fig.  2  using  Junction.  It  was 
found  that  anomalies  of  comparable  magnitude  could 
also  be  observed  using  Jimction,  but  only  for  field 
points  much  closer  to  the  surface. 

C.  Test  of  the  Same  Surface  Rule 

It  is  widely  accepted  that  the  same  surface  rule  gives 
the  best  result  for  the  field  radiated  by  a  wire  grid 
model.  To  test  this  assertion,  NEC  was  used  to 
compute  the  near  field  for  several  wire  radii,  and 
compared  to  Junction.  The  frequency  was  300  MHz 
with  y  =  O.IA.  The  field  points  were  chosen  as  dose 
as  possible  to  the  plate,  but  not  so  close  that  the 
anomalies  might  obscure  the  results. 

Figs.  4a,  b  show  the  tangential  field  Ex  at  z/g  — 
±0.8  and  Fig.  4c  the  normal  field  Ez  &t  zfg  =  0.8. 
It  is  interesting  to  note  that  the  same  surface  rule 
gives  the  best  result  for  Ex  but  not  for  Ez-  Hence, 
it  is  not  possible  to  choose  a  wire  radius  that  is  si¬ 
multaneously  optimum  for  both  the  tangential  and 
normal  field  components.  It  is  also  noted  that  Ex  is 
more  sensitive  to  the  wire  radius  than  Ez. 

Fig.  4  also  shows  that  Ex  on  the  shadow  side  is 
more  sensitive  to  wire  radius  changes  than  on  the 
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Figure  3:  Near  field  of  the  plate  with  grid  size  g  =  0.1  A.  Comparison  of  NEC  wire  grid  (ooo)  and  Junction 
surface  patches  (— ).  (a)  Ex,  zjg  =  ±0.4  (b)  Ex,  zjg  =  ±0.8  (c)  E^,  zjg  =  0.4  (d)  E^,  zjg  =  0.8.  The 
field  point  is  in  the  y  =  0  plane. 


Figure  4:  Effect  of  wire  radius  on  the  near  field  of  the  plate,  with  grid  size  g  =  O.IA.  The  wire  radius 
that  satisfies  the  same  surface  rule  is  au,=0.0145  m.  Using  NEC,  with  Cu,  (ooo),  au,/2  (+++),  2a,j,  (xxx). 
Comparison  is  with  Junction  {— ).  (a)  Ex,  z/g  =  0.8  (b)  Ex,  zjg  =  -0.8  (c)  E^,  z/g  =  0.8.  The  field  point 
is  in  the  y  =  0  plane. 


E 


lit  side.  This  is  probably  because  the  accuracy  of 
^scatt  Qjj  shadow  side  is  more  critical,  as 
must  cancel  in  order  to  accurately  predict  the 
low  field  level  on  the  shadow  side  of  the  plate. 
was  examined  separately,  and  was  foimd  to  be  much 
less  sensitive  to  wire  radius  changes  than  the  total 
field 

A  higher  frequency  of  750  MHz  was  also  tried. 
The  corresponding  grid  size  in  this  case  is  0.25A.  The 
results  were  stiU  good,  with  typical  errors  of  2  dB  or 
less.  The  sensitivity  with  respect  to  wire  radius  was 
increased  for  Ex-,  in  the  shadow.  The  sensitivity  of 
Ez  remained  relatively  weak. 

D.  Field  Between  the  Wires 

The  previous  results  were  taken  along  y  =  0,  which 
happens  to  be  above  a  wire.  To  see  what  happens  in 
other  situations,  a  contour  plot  in  the  x—y  plane  was 
generated,  from  which  some  qualitative  observations 
could  be  made.  Also,  to  obtain  a  quantitative  com¬ 
parison,  a  cut  along  y  =  y/2,  which  is  in  between  the 
wires,  was  used.  The  heights  used  were  the  same  as 
before,  i.e.  z/g  =  ±0.4  and  zjg  =  ±0.8. 

It  was  found  that  very  little  changed  with  re¬ 
spect  to  the  peak  to  peak  amplitude  of  the  near 
field  anomalies.  The  average  field  level  changed  only 
slightly.  By  comparing  the  y  =  0  and  y  =  y/2  cases 
it  was  foimd  that  with  y  =  y/2,  Ex  increased  by 
0.7  dB,  and  Ez  decreased  by  1.3  dB.  This  seems  cor¬ 
rect,  as  the  wire  tends  to  short  out  Ex  and  support 
the  surface  charge  that  is  associated  with  Ez- 

3  Near  Field  of  the  Sphere 

The  wire  grid  sphere  is  shown  in  Fig.  5.  The  surface 
patch  model  is  similar,  except  that  each  quadrilat¬ 
eral  axea  is  divided  into  two  triangular  patches.  The 
sphere  is  centered  at  the  origin,  and  its  radius  is 
a  =  15  m.  The  wire  spacing  is  A0  =  =  tt/S. 

At  the  equator  the  grid  size  is  y  =  5.853  m  and  the 
wire  radius  is  a^j  =  0.9128  m.  Near  the  poles  the 
grid  size  is  smaller  and  the  wire  radii  are  adjusted  in 
accordance  with  the  same  surface  rule. 

The  incident  field  is  a  plane  wave  with  an  am¬ 
plitude  of  1  V/m,  polarized  in  the  y  direction  and 
traveling  in  the  +x  direction.  It  is  given  by  = 
-g-jfca:  jjj  gj2  cases  the  total  field  E  =  ±  E^<^* 


Figure  5:  Wire  grid  model  for  the  sphere.  The  ra¬ 
dius  is  15  m.  Ay  polarized  plane  wave  of  1  V/m  is 
incident,  traveUing  along  ±x.  The  field  point  is  at 
a  height  h  above  the  surface,  at  the  equator  where 
0  =  90°  and  0  <  (^  <  360°. 

is  plotted,  using  a  reference  of  0  dB  =  1  V/m.  The 
field  point  is  at  a  height  h  above  the  surface,  at  the 
equator  where  9  =  90°  and  0  <  ^  <  360°.  The  fre¬ 
quency  was  chosen  as  12.8  MHz  so  that  ka  =1.6  and 
g  =  O.IA. 

A.  Junction  and  Exact  Solution  for  the  Sphere 

The  Junction  surface  patch  model  of  the  sphere  was 
evaluated  by  comparison  with  an  exact  solution  ob¬ 
tained  from  an  eigenfunction  series  [15].  The  field 
was  calculated  at  a  height  of  /i  =  2.34  m  off  the 
surface,  so  that  h/g  =  0.4.  The  agreement  was  so 
close  as  to  be  almost  indistinguishable,  suggesting 
that  Junction  is  highly  suitable  for  the  computation 
of  the  near  field  of  a  smooth  body.  As  a  reference  so¬ 
lution,  the  eigenfunction  series  was  used  in  the  sub¬ 
sequent  validations. 

B.  Extent  of  the  Near  Field  Anomalies 

The  field  was  calculated  with  NEC,  at  heights  of 
h  =  2.34  m  and  4.68  m  off  the  surface,  so  that 
h/g  =  0.4  and  0.8,  respectively.  Fig.  6  shows  that 
for  h/g  =  0.4,  on  the  lit  side,  the  anomalies  in  E^ 
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and  Er  axe  of  comparable  magnitude.  For  the 
onset  of  anomalies  occurs  at  about  h/g  =  0.4  on  the 
lit  side,  and  h/g  =  —0.8  on  the  shadow  side.  Similar 
behavior  was  noted  for  the  plate,  so  these  aspects 
appear  to  be  independent  of  the  precise  shape  of  the 
scattering  body. 

A  higher  frequency  was  also  tried.  Using  ka  =  A 
and  g  =  0.25A,  excellent  agreement  with  the  exact 
solution  was  observed,  and  we  found  that  h/g  >  0.8 
was  still  a  good  criterion  for  avoiding  anomalies. 

C.  Test  of  the  Same  Surface  Rule 

The  field  was  calculated  with  NEC  at  a  height  of 
h  =  4.68  m  off  the  surface,  so  that  h/g  =  0.8.  In 
Fig.  7  we  see  that  the  same  surface  riile  gives  the 
best  result  for  both  E^  and  Er-  The  cases  of  too 
thick  wires  (2au,)  and  too  thin  wires  {ayj/2)  straddle 
the  exact  result.  (The  case  using  the  same  surface 
rule  was  not  plotted,  as  it  is  indistinguishable  from 
the  exact  solution.)  This  is  unlike  the  plate,  where 
the  same  surface  rule  worked  for  the  tangential  field, 
but  not  for  the  normal  field. 

The  tangential  E  for  the  sphere  was  much  less 
sensitive  to  wire  radius  changes  than  the  plate.  This 
is  not  because  of  the  shape,  but  because  the  field  in 
the  shadow  of  the  plate  is  much  lower.  As  already 
mentioned  in  Section  2  C,  errors  in  the  scattered  field 
are  more  evident  when  the  incident  and  scattered 
fields  are  supposed  to  be  cancelling. 

The  internal  field  of  a  closed  body  is  known  to 
be  a  sensitive  indicator  of  the  quality  of  a  moment 
method  solution,  as  cancellation  of  the  incident  field 
must  take  place  inside  the  scatterer.  This  was  not 
explored  here,  as  there  is  no  corresponding  test  that 
can  be  used  for  the  plate. 

A  higher  frequency  using  ka  =  A  and  g  =  0.25A 
was  also  tried.  It  was  found  that  the  sensitivity 
with  respect  to  wire  rariius  of  E^  was  increased  on 
the  shadow  side,  and  hardly  affected  on  the  lit  side. 
The  sensitivity  of  Er  remained  relatively  weak.  Even 
at  this  higher  frequency,  the  results  remained  good, 
with  errors  on  the  order  of  1  dB  or  less  when  the 
same  surface  rule  was  obeyed. 


Figure  7:  Effect  of  wire  radius  on  the  near  field  of 
the  sphere,  with  grid  size  g  =  O.IA  and  h/g  =  0.8. 
The  wire  radius  that  satisfies  the  same  surface  rule 
is  au;=0.9128  m.  Using  NEC  with  Ou,/2  (+++),  20^, 
(xxx).  Comparison  is  with  exact  solution  ( — ).  (a) 
E^  (b)  Er- 
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4  Comparison  of  NEC  and  MBC 


The  NEC  and  MBC  wire  codes  were  used  to  compute 
the  near  field  of  the  sphere  at  /1/5  =  0.4.  Fig.  8  shows 
that  the  positions  and  amplitudes  of  the  anomalies 
axe  very  similar.  This  is  noteworthy,  as  the  two 
codes  are  quite  different,  i.e.  NEC  uses  sine  and 
cosine  basis  functions  with  point  matching,  whereas 
MBC  uses  piecewise  sinusoids  and  Galerkin’s  method. 
Other  tests  using  MBC  revealed  that  the  extent  of 
near  field  anomalies  and  the  dependence  on  wire  ra¬ 
dius  were  very  similar  to  NEC.  Hence,  the  comments 
made  in  previous  sections  with  regard  to  the  NEC 
wire  grid  models  would  seem  to  apply  to  MBC  as 
well.  The  square  plate  was  also  tried,  and  .similar 
near  field  behavior  was  found  using  both  codes. 

5  Conclusion 

The  near  field  of  a  smface  patch  model  is  smoother 
than  for  a  wire  grid  with  the  same  segmentation  size. 
Nevertheless,  a  wire  grid  was  found  to  give  good  re¬ 
sults  when  the  observer  is  a  small  distance  h  off  the 
surface,  provided  that  h/g  >  0.4  on  the  lit  side,  and 
^  0-8  on  the  shadow  side.  This  was  found  to 
be  true  for  both  the  plate  and  the  sphere,  and  was 
tested  for  several  grid  sizes  and  frequencies. 

Use  of  the  same  surface  rule  gives  the  best  result 
for  the  near  fields  most  of  the  time  but  not  all  of  the 
time.  It  worked  for  the  tangential  E  for  the  plate 
and  sphere,  for  the  normal  E  on  the  sphere,  but  not 
for  the  normal  E  on  the  plate. 

The  tangential  E  was  more  sensitive  than  the 
normal  E  with  respect  to  wire  radius,  and  the  great¬ 
est  errors  occured  in  the  tangential  component  when 
the  same  surface  rule  was  violated.  The  greatest 
sensitivity  occured  with  the  field  point  in  the  deep 
shadow,  where  the  incident  and  scattered  fields  are 
supposed  to  cancel.  On  the  lit  side,  the  effect  of 
changing  the  wire  radius  was  much  smaller. 

The  type  of  MM  formulation  used  in  the  wire 
codes  was  not  a  major  factor,  as  similar  results  were 
obtained  from  the  NEC  code  and  MBC  code. 


(b) 


Figure  8:  Comparison  of  NEC  ( — )  and  MBC  (ooo) 
for  the  sphere,  with  grid  size  g  =  O.IA  and  h/g  =  0.4. 
(a)  E<f,  (b)  Er. 
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ABSTRACT  -  In  this  paper,  the  Finite-Difference 
Time-Domain  (FDTD)  method  was  used  to 
determine  the  reflection  coefficient  vs.  frequency  of  a 
Vivaldi  Tapered  Dual-Slot  Antenna  fed  through  a 
planar  balun.  The  reflection  coefficient  at  the  input 
port  of  the  feed  stripline  was  determined  through  the 
separate  computation  of  the  reflection  coefficient  of 
the  antenna  section  and  the  scattering  matrix  of  the 
balun,  assuming  that  the  electromagnetic  coupling 
between  these  sections  was  negligible.  Experimental 
results,  which  were  obtained  on  a  prototype  of  this 
structure,  confirmed  the  validity  of  the  proposed 
approach  and  demonstrate  that  the  coupling 
between  the  two  sections  does  not  aflTect  the  accuracy 
of  the  model. 

1.  INTRODUCTION 

Vivaldi  tapered  slot  antennas  (TSAs)  consist  of 
tapered  slots  etched  in  the  metallization  of  a  dielectric 
substrate  with  an  exponential  contour  for  the  slot  edges 
[1],[2].  These  traveling  wave  anteimas  provide  a 
symmetric  endfire  beam  with  appreciable  gain  and  low 
sidelobes.  Moreover,  they  offer  a  wideband 
performance  and  are  characterized  by  a  very  low 
profile.  Single  sided  or  bilateral  anteimas  can  be 
realized  with  the  feeding  section  integrated  on  the  same 
dielectric  substrate.  However,  owing  to  the  broadband 
characteristics  of  Vivaldi  TSA,  a  proper  matching 
section  must  be  used  to  couple  the  antenna  with  the 
feeding  transmission  line. 

In  this  work,  the  FDTD  method  was  used  to 
detennine  the  reflection  coefficient  at  the  input  port  in 
the  feed  stripline  of  a  dual-slot  Vivaldi  TSA.  A  planar 
balun  was  used  to  couple  the  antenna  section  to  the 
stripline  [3], [4].  The  FDTD  analysis  was  carried  out  by 
splitting  the  problem  in  two  separate  ones:  the 
determination  of  the  reflection  coefficient  of  the  antenna 
section  and  of  the  scattering  matrix  of  the  balim.  Then, 
assuming  that  the  feeding  and  radiating  sections  of  the 
antenna  were  not  electromagnetically  coupled,  the 
reflection  coefficient  at  the  input  section  of  the  stripline 
was  computed.  Results  were  successfully  compared 


with  a  few  ejqjerimental  data  obtained  at  Elettronica 
S.p.A.  (Rome)  where  a  prototype  of  this  antenna  was 
built  up  thus  confirming  the  assumption  validity. 

Because  the  coupling  between  the  two  sections  of  the 
wiiole  antenna  can  be  ignored  without  adversely 
affecting  accuracy,  an  FDTD-based  equivalent  network 
representation  of  the  wfiole  structure  can  be  than 
derived  which  would  allow  a  simple  and  efficient 
numerical  analysis  of  the  electromagnetic  response  of 
these  devices.  Moreover,  this  approach  allows  an 
accurate  refinement  of  the  grid  for  the  two  disconnected 
sections,  greatly  reducing  the  need  for  computational 
resources  thus  fully  exploiting  the  capability  of  the 
FDTD  method. 

2.  ANTENNA  CONFIGURATION 

A  schematic  view  of  the  antenna  section  with 
the  feeding  transmission  line  and  the  planar  balun  is 
depicted  in  Fig.  1 .  The  balun  was  designed  according  to 
the  scheme  proposed  in  [4].  Basically,  it  consists  of  a 
stripline-to-slot  transition  realized  by  using  two  stubs, 
i.e.  one  terminating  the  stripline  and  the  other  the 
slotline  which  directly  feeds  the  antenna.  The  overall 
dimensions  for  the  antenna  and  the  feeding  sections  are 
reported  in  the  same  figure.  This  antenna  is  expected  to 
properly  operate  in  the  6-18  GHz  fi'equency  range. 

3.  FDTD  ANALYSIS 

The  FDTD  method  [5],[6]  consists  of  the 
numerical  solution  of  Maxwell's  equations  in  time 
domain  through  a  finite  difference  approximation  of  the 
partial  derivatives  which  appear  in  curl  operators.  A 
numerical  scheme  analogous  to  that  of  Yee's 
formulation  [7]  was  implemented,  except  for  a  variable 
grid  step  size  that  was  introduced  to  properly  describe 
field  behavior  in  the  various  parts  of  the  structure.  In 
each  simulation,  a  Gaussian-shaped  signal  was  used  as 
time-varying  excitation  source.  Since  simulations  were 
run  separately  for  the  antenna  and  the  feeding  sections, 
a  brief  description  of  the  approach  used  in  each  case 
here  follows: 
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Fig.  1:  the  Vivaldi  antenna  and  the  feeding  balun. 
a)  antenna  section 

The  grid  step  size  was  linearly  changed  through 
the  mesh  domain  to  provide  a  better  tracking  of  the 
exponential  contour.  This  means  that,  in  the  x  direction. 
Ax’  =  a  Ax  where  Ax’,  Ax  are  the  step  sizes  of  two 
contiguous  cells  and  a  is  a  coefBcient  that  would 
depend  on  the  design  parameter  of  the  exponential 
profile. 

A  stair  casing  approximation  was  used  to  describe  this 
contour  according  to  the  fact  that  a  conformal  approach 
is  not  expected  to  give  a  significant  improvement  in  the 
results  [8].  Due  to  the  symmetries  of  this  dual-slot 
configuration  (see  Fig.  2a),  the  FDTD  mesh  domain  was 
reduced  to  a  quarter  of  the  total  domain  by  enforcing  an 
electric  and  a  magnetic  wall  in  the  x=0  and  z=0  planes, 
respectively.  The  minimum  and  maximum  grid  step 
sizes  along  the  x  direction  were  90  pm  and  130  pm, 
respectively.  On  the  other  hand,  in  the  y  and  z 
directions,  a  constant  grid  step  size  of  110  pm  along  y 
and  102  pm  along  z  was  used.  A  grid  with  161  x  296  x 
41  nodes  along  x,y,z,  respectively,  was  obtained.  Since 
the  maximum  step  size  is  about  X/5Q  at  40  Ghz,  a 
reasonable  accuracy  in  the  results  is  expected  at  least  up 
to  that  fi'equency. 

A  voltage  source  excited  pulse  propagation  in 
the  slotline.  With  reference  to  Fig.  2a,  this  source  was 
located  at  AA’  plane  (i.e.  the  excitation  plane)  while 


simulation  results  were  stored  in  terms  of  voltage  at  BB’ 
plane  (i.e.  the  input  plane).  The  distances  BB’  -  AA’ 
and  BB'  -  HH'  were  properly  chosen  (20  and  30  cells, 
respectively)  in  order  to  remove  the  evanescent  modes 
launched  at  the  discontinuity  planes.  The  reflection 
coefficient  T^  at  the  input  plane  of  the  antenna  was 
determined  according  to  the  following  procedure: 
a.l  -  a  first  simulation  was  run  with  the  slotline  alone 
terminated  on  a  matched  load.  This  allowed  to  sample  a 
clean  incident  signal  (i.e.  without  antenna  reflections)  at 
BB'  plane.  Litva’s  second  order  boundary  conditions  [9] 
were  used  to  simulate  a  matched  load  at  the  end  of  the 
slotline.  Since  these  conditions  require  the  knowledge  of 
phase  velocities  at  the  lower  and  the  upper  frequency  of 
the  range  of  interest,  other  FDTD  simulations  were  run 
before,  in  order  to  compute  the  propagation  constant  vs. 
fi-equency  in  the  slotline; 

a.2  -  a  second  simulation  was  run  for  the  entire  antenna 
section  and  the  total  voltage  response  detected  at  the 
input  BB'  plane.  Since  this  signal  would  correspond  to 
the  incident  one  previously  computed  plus  the  reflected 
signal  fi-om  antenna  terminals,  the  latter  was  then 
extracted  by  subtraction.  Litva's  conditions  were  used  to 
simulate  a  matched  load  at  the  slotline  end,  wfiile 
Higdon's  second  order  boundary  conditions  [10]  were 
enforced  to  absorb  the  waves  radiated  fi’om  the  antenna; 
a.3  -  finally,  the  reflection  coefficient  vs.  fi-equency  was 
computed  through  the  ratio  of  the  FFT  of  the  reflected 
and  incident  signals. 

b)  feeding  section 

Since  the  feeding  section  is  a  two-port 
symmetric  network  with  losses,  three  of  the  four 
parameters  of  its  scattering  matrix  must  be  determined. 
With  reference  to  Fig.  2b,  this  was  carried  out  as 
follows: 

Sj  I  and  S22  parameters:  a  pulse  signal  was  launched  in 
the” striplihe’  by  exciting  the  field  distribution  shown  in 
the  same  figure,  at  the  excitation  CC’  plane.  In  order  to 
properly  compute  the  voltage  of  the  TEM  incident 
wave  at  the  input  DD’  plane,  a  first  simulation  was  run 
by  terminating  the  stripline  with  a  matched  load.  Due  to 
the  TEM  propagation  in  the  line  (i.e.  normal  field 
incidence  on  the  boundary),  matching  conditions  were 
obtained  by  enforcing  there  Mur’s  first  order  absorbing 
boundary  conditions  [11].  Then,  a  second  simulation 
was  run  for  the  entire  feeding  section  and  voltage 
signals  stored  at  DD’and  BB’  planes.  Since  port  2  of  the 
network  corresponds  to  the  slotline  end,  Litva’s 
absorbing  boundary  conditions  were  enforced  there  to 
simulate  a  matched  load  (see  a.l  above).  Sjj  and  S12 
vs.  frequency  were  computed  through  the  ratio  of  the 
FFT  of  the  reflected  and  transmitted  signals  with  the 
incident  one. 
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S22  parameter:  with  the  stripline  terminated  in  a 
matched  load  a  simulation  was  run  by  launching  a  pulse 
signal  at  EE’  plane  and  by  storing  the  total  signal  at  BB’ 
plane,  respectively.  The  determination  of  S22 
immediately  follows  since  the  incident  signal  at  BB’ 
plane  was  already  known  from  the  previous  simulations 
(see  a.l). 


metaliijation 
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Fig.  2a,b:  schematic  view  of  (a)  the  Vivaldi  antenna  and  (b) 
the  planar  balun.  In  both  figures,  the  reference  planes  for 
FDTD  analysis  ^e  indicated.  In  Fig.  2b,  the  exciting  field 
distribution  in  the  feed  stripline  is  also  depicted. 


The  same  FDTD  grid  was  used  in  each 
simulation.  The  minimum  and  maximum  grid  step  sizes 
were  70  lun  and  115  pm  along  the  x  direction  and  90 
pm  and  120  pm  along  the  y  direction.  Again,  a  constant 
grid  step  size  of  102  pm  along  the  z  direction  was  used. 
A  grid  with  110  x  110  x  41  nodes  along  x,y,z, 
respectively,  was  obtained. 

The  total  reflection  coefficient  Tjjj  at  the 
stripline  input  DD'  plane  was  determined  by  using  the 
following  equation: 


rin  =  s 
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on  the  assumption  that  the  antenna  and  the  feeding 
sections  were  not  electromagnetically  coupled. 

In  this  FDTD-derived  equivalent  network 
model  the  radiating  section  acts  as  a  lumped,  frequency 
dependent  impedance  loading  the  feeding  network 
output  terminals. 

4.  RESULTS 

In  Fig.  3,  the  reflection  coefficient  amplitude  vs 
frequency  at  the  input  BB’  plane  of  the  antenna  section 
is  shown.  Results  indicate  the  broadband  behavior  of 
this  kind  of  antenna  up  to  40  GHz,  with  the  appearance 
of  local  maxima  and  minima  due  to  the  antenna  finite 
dimensions  and  the  dielectric  truncation  at  its  far  end. 

In  Fig.  4,  the  amplitude  of  the  scattering 
parameters  vs  frequency  for  the  feeding  section  is 
reported.  Results  obtained  for  S^j,  S12  (§12)  3^'“^  S22 
(i.e.  Fig.  4a,  4b  and  4c,  respectively)  aJlow  to  determine 
the  bandwidth  of  this  matching  network  vriiich  covers 
the  6-18  Ghz  operating  frequencies.  A  strong 
mismatching  does  appear  out  of  that  frequency  range. 

Results  obtained  for  the  reflection  coefficient  at 
the  input  section  of  the  stripline  with  the  antenna 
loading  are  shown  in  Fig.  5.  In  the  same  figure,  the 
experimental  results  measured  by  using  a  WILTRON 
360  Network  Analyzer  are  reported.  Measurements 
were  carried  out  using  a  standard  SMA  cormector-to- 
stripline  transition  that  it  was  experimentally  found  to 
have  negligeable  effects  on  results  accuracy. 

The  agreement  between  simulation  results  and 
measurements  is  quite  satisfactory  and  validate  the 
assumption  of  electromagnetic  decoupling  between  the 
radiating  structure  and  the  feeding  network. 

5.  CONCLUSIONS 


In  this  work  the  FDTD  method  was  used  to 
analyze  the  electromagnetic  behavior  (i.e.  in  terms  of 
scattering  parameters  and  reflection  coefficients),  of  a 
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Fig.  3:  amplitxide  vs.  frequency  of  the  reflection  coefiScient  of 
the  Vivaldi  antenna  at  BB’  plane  in  the  slotline. 


Fig.  4:  amplitude  vs.  frequency  of  the  Sn,  S12  (S21)  and  S22 
scattering  parameters  of  the  planar  balun. 


Fig.  5:  reflection  coefficient  of  the  Vivaldi  antenna  with  the 
planar  balun  at  the  input  plane  DD’  in  the  stripline: 
continuous  line  -  numeric^  results,  circles  -  measured  data 

Vivaldi  tapered  dual-slot  antenna  fed  through  a  planar 
balun. 

Despite  of  the  complexity  of  the  entire 
structure,  a  suitable  approximation  was  made  which 
allowed  to  separately  analyze  the  radiating  section  and 
the  planar  balun.  Numerical  results  did  show  the 
intrinsic  broadband  characteristics  of  the  Vivaldi 
antenna  and  allowed  the  determination  of  the  bandwidth 
of  the  balim.  Experimental  data  for  the  reflection 
coefficient  at  the  input  port  of  the  feed  stripline  did 
agree  with  simulation  results,  thus  indicating  the 
powerfulness  of  FDTD  in  dealing  with  these  complex 
radiating  structures  and  the  validity  of  the  proposed 
approach. 

Further  investigations  will  concern  the  analysis 
of  the  scattering  matrix  of  more  complex  planar  baluns 
and  the  determination  of  the  electromagnetic  coupling 
between  a  few  adjacent  radiating  elements,  i.e.  a 
subarray  of  two,  three  or  four  Vivaldi  antennas. 
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Abstract — To  maximize  light-trapping,  the  absorp¬ 
tion  of  light  in  the  solar  cell  is  maximized.  The  ways 
to  increase  light-trapping  are  to  texture  the  surfaces 
of  the  solar  cell  and  to  use  anti-reflection  coatings. 
The  power  spectrum  of  sunlight  also  plays  an  im¬ 
portant  role  in  light-trapping.  In  general,  a  solar 
cell  consists  of  multiple  layers  of  dielectric  materials. 
Each  dielectric  has  a  complicated  surface  texture  ge¬ 
ometry  to  increase  light-trapping.  This  paper  con¬ 
centrates  on  solving  Maxwell’s  equations  for  the  gen¬ 
eral  solar  cell  conflguration  under  illumination  from 
the  sun.  The  absorption  and  maximum  achievable 
current  density  are  calculated  and  used  to  quantify 
light-trapping  in  a  given  solar  cell  design. 

Thin  solar  cells  promise  to  yield  higher  current  col¬ 
lection  than  thick  solar  cells  at  a  lower  cost  [1].  Low 
cost  solar  cells  are  usually  characterized  by  short  dif¬ 
fusion  length  semiconductors.  Most  minority  carri¬ 
ers  created  within  the  distance  equal  to  the  diffusion 
length  contribute  to  the  electrical  current  of  a  so¬ 
lar  cell.  Hence,  the  solar  cell  must  be  thin  when 
low  quality  materials  are  used.  As  solar  cells  de¬ 
crease  in  size,  the  ray-trace  model  becomes  inac¬ 
curate  cis  previously  demonstrated  in  [2].  A  full- 
wave  Finite-Difference  Time-Domain  (FDTD)  light¬ 
trapping  model  is  demonstrated  to  accurately  study 
light-trapping  of  thin-iilm  solar  cells. 

I.  Introduction 

Solar  cells  are  semiconductor  devices  designed  to 
convert  light  into  electrical  power.  In  order  to  be 
cost  effective  they  must  be  as  efficient  as  possible. 
Solar  cell  efficiency  is  related  to  the  percent  of  the 
incident  sunlight  converted  into  electrical  power. 
One  goal  in  the  design  of  solar  cell  devices  is  to 
trap  and  absorb  as  much  light  as  possible  inside 
the  semiconductor.  In  the  language  of  the  solar  cell 
community,  the  goal  is  to  maximize  light-trapping. 

One  effective  way  to  maximize  light-trapping  is 
to  coat  the  semiconductor  with  a  thin  film  which 
reduces  reflection.  This  anti-reflective  coating  en¬ 
sures  a  higher  percentage  of  light  enters  the  cell  so 
more  light  is  absorbed.  A  second  effective  approach 
to  increase  light-trapping  is  to  texture  the  surface 
of  the  solar  cell  [1],  [3],  [4],  [5],  [6],  [7],  [8],  [9].  The 
texture  creates  multiple  reflections  which  will  lead 


to  increased  light-trapping.  The  specific  design  of 
the  textures  is  critical  to  maximize  light-trapping. 
All  commercial  solar  cell  designs  include  encapsu¬ 
lation  to  protect  the  solar  cell  from  nature.  The 
effect  of  encapsulation  on  light-trapping  is  not  well 
understood.  The  integration  of  the  anti-reflection 
coating,  encapsulation,  and  texture  into  the  solar 
cell  design  determines  the  overall  light-trapping  ca¬ 
pacity  of  a  solar  cell  [10]. 

Historically,  the  solar  cell  industry  did  not  have 
a  means  to  effectively  and  accurately  predict  the 
light-trapping  of  a  given  solar  cell  geometry.  Each 
solar  cell  was  manufactured  and  then  its  efficiency 
was  measured.  This  is  expensive  and  time  con¬ 
suming.  This  research  fills  the  need  of  the  solar 
cell  community  by  quantifying  and  predicting  the 
light-trapping  of  a  given  solar  cell  design  with  the 
aid  of  numerical  models.  The  result  of  this  re¬ 
search  provides  industry  a  means  to  quantify  the 
light-trapping  capability  of  many  different  three- 
dimensional  solar  cell  designs.  By  modeling  light¬ 
trapping,  the  expensive  and  time  consuming  step 
of  fabrication  is  streamlined.  Only  the  promising 
designs  need  to  be  fabricated  and  tested.  This,  in 
turn,  frees  valuable  resources  which  can  be  utilized 
to  design  more  efficient  solar  cells.  The  basics  of  nu¬ 
merical  solar  cell  analysis  emd  the  results  of  the  ray- 
trace  analysis  were  detailed  in  a  companion  paper 
[11].  A  reader  not  familiar  with  solar  cell  analysis 
is  encouraged  to  read  the  cited  paper  first. 

II.  Background 

A  full-wave  model  of  light-trapping  based  on 
the  finite-difference  time-domain  method  (FDTD) 
method  is  designed  to  include  three-dimensional 
surface  textures,  encapsulation,  anti-reflection  coat¬ 
ings,  the  energy  spectrum  of  sunlight  and  the  po¬ 
larization  of  light.  Unlike  the  ray-trace  model,  the 
FDTD  model  explicitly  includes  the  wave  properties 
of  light.  The  FDTD  model  accounts  for  diffraction 
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due  to  small  surface  textures.  Small  geometric  fea¬ 
tures  are  common  on  thin  solar  cells  designed  to 
optimize  light-trapping. 

This  research  uses  the  basic  FDTD  algorithm  first 
developed  by  Kane  Yee  in  1966  to  numerically  solve 
Maxwell’s  Equations  and  then  extended  by  Tafiove. 
It  has  been  extensively  documented  so  the  basic 
methodology  will  not  be  repeated  here  [12],  [13], 
[14].  The  general  solar  cell  geometry  with  surface 
texture,  encapsulation,  and  anti-reflection  coating  is 
described  based  on  the  requirements  of  the  FDTD 
algorithm.  A  periodic  boundary  is  introduced  to 
reduce  the  required  computer  resources.  Sunhght  is 
incorporated  into  the  FDTD  model  as  an  electro¬ 
magnetic  source.  Finally,  the  light-trapping  model 
is  completed  with  an  explanation  of  the  measure¬ 
ment  of  light-trapping.  The  paper  concludes  with 
a  general  discussion  concerning  the  limitations  and 
capabilities  of  FDTD  modeling  for  solar  cells. 


A.  Optical  Properties  of  Matter 

The  governing  equations  of  light-trapping  are 
Maxwell’s  equations 


dt 
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where  both  the  electric  field  and  magnetic  field  {E 
and  H)  describe  the  interaction  between  light  and 
the  solar  cell.  The  solar  cell  is  described  by  its  per¬ 
mittivity,  e,  and  permeability,  p. 

The  parameters,  e  =  Crto  and  a,  describe  the  op¬ 
tical  properties  of  the  solar  cell.  These  properties 
are  quantified  directly  or  through  the  optical  pa¬ 
rameters  of  refractive  index,  n,  and  the  extinction 
coefficient,  k.  The  quantities  are  related  by 


e,,  =  —  k"^ 

(3) 

a  =  2nkeou! 

(4) 

where  c  is  the  speed  of  light,  w  is  the  angular  fre¬ 
quency  of  light  and  eo  is  the  permittivity  of  free 
space  [15]. 

In  actual  practice,  the  light  illuminating  a  solar 
cell  experiences  dispersion.  Specifically,  the  index 
of  refraction  depends  on  the  wavelength  of  the  inci¬ 
dent  light.  The  extinction  coefficient  also  depends 
on  wavelength  and  determines  the  amount  of  light 
that  is  absorbed  in  the  solar  cell  [16].  The  index  of 
refraction  varies  over  a  wide  range  from  3.4  to  5.6 


which  means  dispersion  effects  are  important.  The 
extinction  coefficient  varies  by  orders  of  magnitude. 
The  extinction  coefficient  becomes  small  as  wave¬ 
length  increases  which  means  little  light  is  absorbed 
at  the  higher  wavelengths.  The  higher  wavelengths 
corresponds  to  lower  energy  photons.  Solar  cell  de¬ 
signs  must  accommodate  dispersion  to  ensure  ac¬ 
curate  light-trapping  predictions.  Solar  cell  designs 
must  also  overcome  the  small  extinction  coefficient 
by  maximizing  light-trapping. 

B.  Solar  Cell  Geometry 

Defining  the  solar  cell,  or  any  object,  in  the 
FDTD  model  requires  that  coefficients  be  calculated 
at  each  field  location  in  the  FDTD  grid  based  on 
the  material  properties.  The  coefficients  are  cal¬ 
culated  based  on  the  material  parameters  of  per¬ 
mittivity,  permeability,  and  conductivity.  Yee’s  al¬ 
gorithm  imposes  a  Cartesian  grid  on  the  solar  cell 
and  surrounding  space.  Consequently,  the  location 
of  a  field  relative  to  the  Yee  grid  and  the  material 
determines  the  values  of  the  coefficients. 

Similar  to  the  ray-trace  model,  the  FDTD  model 
defines  the  solar  cell  as  a  set  of  materials  and  in¬ 
terfaces.  Each  surface  separates  different  materials 
within  the  solar  cell.  The  surface  normal  is  not  ex¬ 
plicitly  required  by  the  FDTD  model.  Unlike  the 
ray-trace  model,  the  solar  cell  is  not  restricted  to 
using  three-node  planar  elements.  The  interfaces 
are  defined  as  surfaces  represented  by  an  arbitrary 
function  of  x-  and  y-coordinates,  K(n,  y)as  shown  in 
Figure  1.  The  surface  is  confined  to  the  rectangle 
between  Nxl  <—  i  <=  Nx2  and  Nyl  <=  j  <= 
Ny2:  For  each  field  component,  the  correspond¬ 
ing  material  parameter  is  defined.  This  definition 
is  generalized  to  multiple  surfaces  by  defining  N  in¬ 
terfaces;  Kl, K2, K3 . .  .KN  as  necessary. 

The  encapsulation  layer  and  anti-reflection  coat¬ 
ings  are  incorporated  into  the  FDTD  model  by 
defining  the  necessary  number  of  surfaces.  One  ben¬ 
efit  of  using  surfaces  to  define  the  solar  cell  is  the 
simplicity  of  defining  a  anti-reflection  coating  with 
as  many  layers  as  needed.  In  contrast,  the  ray-trace 
model  only  incorporated  a  two-layer  anti-reflection 
coating  while  the  FDTD  model  is  unlimited.  There 
is  no  difference  between  defining  the  solar  cell  and 
the  anti-reflection  coating,  or  even  the  encapsula¬ 
tion  layers. 

C.  Periodic  Boundary  Condition 

Scanning  Electron  Microscope  pictures  of  chem¬ 
ically  etched  textured  solar  cells  show  a  periodic 
structure  [1].  This  allows  us  to  model  the  textured 
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solar  cell  as  an  infinite  grid.  The  infinite  grid  allows 
us  to  use  a  periodic  boundary  condition  which  con¬ 
serves  valuable  computer  resources.  In  the  ray-trace 
model,  the  periodic  boundary  is  simple  to  imple¬ 
ment  by  checking  when  a  light  ray  hits  a  periodic 
boundary.  On  the  contrary,  the  incorporation  of 
the  periodic  boundary  into  the  FDTD  algorithm  is 
complicated  by  the  light  interacting  with  the  entire 
periodic  boundary  simultaneously.  The  light  is  not 
localized  to  a  single  thin  ray  at  a  single  point  on 
the  periodic  boundary.  Instead,  the  light  interacts 
along  the  entire  periodic  boundary.  The  FDTD  al¬ 
gorithm  models  the  entire  light  wave.  As  a  result, 
the  periodic  boundary  contends  with  the  spatial  dis¬ 
tribution  of  power  which  is  inherent  to  light  travel¬ 
ing  in  various  directions. 

Since  the  FDTD  algorithm  calculates  the  values 
of  field  components,  the  periodic  boundary  is  de¬ 
fined  relative  to  field  locations  in  the  Yee  grid.  Fig¬ 
ure  1  shows  the  periodic  boundary  and  the  Yee  field 
locations  at  two  different  heights  within  the  solar 
cell.  The  views  show  two  slices  of  the  solar  cell 
which  are  perpendicular  to  the  periodic  boundary. 
The  first  view  shows  a  planar  slice  at  a  height  of 
k  to  coincide  with  the  fields  in  the  bottom  half  of 
Yee’s  unit  cell.  The  x-  and  y-components  of  the 
electric  field  and  the  z-component  of  the  magnetic 
field  are  contained  in  the  transverse  electric  plane 
(TEl-plane).  Note,  the  TE-plane  has  no  relation  to 
TE  modes  in  a  waveguide.  The  TE-plane  is  de¬ 
fined  because  the  electric  field  components  lie  in 
the  plane  and  are  transverse  to  the  normal  z  di¬ 
rection.  The  field  components  are  drawn  as  small 
arrows  for  the  x-  and  y-components  of  the  electric 
field  and  as  a  small  circle  for  the  magnetic  field. 
The  periodic  boundary  surrounds  the  silicon  and  is 
confined  on  the  electric  fields  which  are  bounded 
by  the  rectangle;  iVxl  <=  i  <=  Nx2  +  1  and 
Nyl  <=  j  <=  Ny2  -I-  1.  The  Transverse  Mag¬ 
netic  plane  (TM-plane)  is  shown  in  the  second  pla¬ 
nar  slice.  The  TM  plane  is  a  slice  through  the  solar 
cell  at  a  height  of  A:  -I- 1/2  to  coincide  with  the  upper 
half  of  a  Yee  unit  cell.  The  TM-plane  contains  the 
X-  and  y-components  of  the  magnetic  field  and  the 
z-component  of  the  electric  field.  Again,  the  field 
components  are  drawn  as  small  arrows  for  the  x-  and 
y-components  of  the  magnetic  field  and  as  a  small 
circle  for  the  electric  field.  By  alternatively  stack¬ 
ing  TM-  and  TE-planes  one  on  top  of  the  other, 
any  FDTD  model  of  the  solar  cell  is  defined.  It  is 
sufficient  to  incorporate  the  periodic  boundary  into 
both  TM-  and  TE-planes  separately  and  then  ap¬ 
ply  the  corresponding  boundary  condition  to  every 


planar  stack  in  the  entire  FDTD  grid. 

Each  field  component  in  the  Yee  grid  only 
depends  on  its  nearest  neighbors.  For  exam¬ 
ple,  when  calculating  Ex{i,j,k),  the  nearest  sur¬ 
rounding  fields  used  by  the  update  equation  are 

Hxii,j,k),Hx{i,  j-l,k),Hy{i,j,  k),  and  Hy{i,j,k- 

1).  The  nearest  neighbors  can  be  found  graphically 
by  drawing  the  smallest  contour  path  perpendicu¬ 
lar  to  the  field  to  be  updated  [13].  This  graphical 
interpretation  of  Yee’s  equations  is  true  for  all  com¬ 
ponent  fields  in  the  Yee  grid.  By  drawing  contours, 
the  fields  whose  contours  do  not  overlap  the  periodic 
boundary  are  known  to  be  unaffected  by  the  peri¬ 
odic  boundary.  However,  the  fields  whose  contours 
overlap  the  periodic  boundary  are  affected  by  the 
periodic  boundary.  The  fields  that  are  unaffected 
by  the  periodic  boundary  are  colored  light  grey  in 
Figure  1.  While  the  field  components  colored  dark 
black  in  Figure  1  are  affected  by  the  periodic  bound¬ 
ary.  The  dark  black  fields  correspond  to  Yee  update 
equations  that  need  to  be  modified  to  incorporate 
the  periodic  boundary.  As  an  example,  the  Ex  field 
along  the  boundary,  j  =  Nyl,  uses  the  surround¬ 
ing  fields;  Hz{i,j,k),  -  1,*),  Hy{i,j,k),  and 

Hy{i,j,k—1).  The  Hz{i,j-l,k)is  not  the  “correct” 
field  to  use  because  of  the  periodic  boundary.  In¬ 
stead,  the  field  to  use  is  Hz{i,  Ny2,  k)  which  comes 
from:  “wrapping”  the  solar  cell  around  the  bound¬ 
ary.  The  field  outside  of  the  boundary  is  replaced  by 
a  field  inside  the  geometrically  opposing  boundary. 
In  general,  the  periodic  boundary  is  incorporated 
into  the  FDTD  algorithm  by  “wrapping”  the  inner 
fields  on  the  opposing  boundary  onto  the  outer  fields 
on  the  periodic  boundary. 

D.  Light  Source 

Sunlight  is  incorporated  into  the  FDTD  model 
based  on  the  standard  total-field/scattered-field  for¬ 
mulation  (TFSF)  of  a  light  source  [13].  The  stan¬ 
dard  TFSF  defines  a  closed  surface.  Inside  the 
closed  surface,  an  arbitrary  plane  wave  is  excited  by 
adding  source  fields  on  the  closed  surface.  The  mag¬ 
nitude  of  the  source  fields  determines  the  intensity 
of  the  light.  The  direction  of  the  source  fields  deter¬ 
mine  the  polarization  of  the  incident  light.  Outside 
the  surface,  the  source  fields  are  removed  so  only 
scattered  waves  exist  outside  of  the  TFSF  surface. 

The  standard  TFSF  does  not  include  the  periodic 
boundary.  The  periodic  boundary  removes  the  need 
for  a  closed  TFSF  surface  which  makes  the  light 
source  trivial  to  implement  in  the  FDTD  model. 
Instead  of  a  closed  surface,  a  single  plane  is  used. 
The  TFSF  plane  is  placed  above  the  solar  cell  with  a 
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Fig.  1.  The  periodic  bound2iry  in  the  FDTD  algorithm. 


constant  surface  normal  in  the  z  direction  as  shown  ensures  that  the  wave  propagates  correctly.  Light 
in  Figure  2.  Electric  and  magnetic  fields  are  excited  only  propagates  in  the  ±z  direction,  parallel  to  the 
on  the  TFSF  plane  which  excite  the  propagation  periodic  boundary, 
of  sunlight  in  the  model.  The  periodic  boundary 

E.  Radiation  Boundary  Condition 


Fig.  2.  The  light  source  in  the  FDTD  model  with  a  periodic 
boundary. 


As  is  the  case  with  the  ray-trace  model,  light 
which  escapes  from  the  solar  cell  causes  numerical 
errors  unless  an  effective  radiation  boundary  con¬ 
dition  is  implemented.  Unlike  the  ray-trace  model, 
the  radiation  or  absorbing  boundary  is  difficult  to 
implement  into  the  FDTD  model.  There  are  many 
different  solutions  to  the  RBC  but  they  all  have 
inherent  limitations  [17],  [18],  [19],  [20],  [21],  [22]. 
None  of  the  various  RBC’s  perfectly  absorb  all  of 
the  light.  Some  fraction  of  the  light  is  always  re¬ 
flected  back  into  the  model.  The  Berenger  Perfectly 
Matched  Layer  (PML)  is  the  best  algorithm  for  re¬ 
ducing  the  reflection  error  [23],  [24].  To  incorporate 
the  PML  into  the  light-trapping  model,  the  PML 
is  modified  to  include  the  periodic  boundary.  The 
periodic  boundary  in  the  PML  is  broken  into  two 
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steps,  just  like  the  normal  Yee  periodic  boundary. 
First,  the  PML  update  is  calculated.  Second,  the 
fields  are  wrapped  around  the  periodic  boundary. 


F.  Measurement  of  Light-  Trapping 

Absorption  is  the  measure  of  light-trapping.  The 
index  of  refraction  and  extinction  coefficient  vary  as 
a  function  of  frequency  via  the  relation  Aw  =  27rc. 
The  frequency  dependent  material  properties  of  the 
solar  cell  dictate  that  a  general  method  to  calculate 
absorption  is  required  by  the  light-trapping  model. 
The  FDTD  model  assumes  a  plane  wave  source  with 
a  general  amplitude  modulation  to  represent  light. 
A  common  source  modulation  used  is  a  Gaussian 
modulated  sinusoid  because  it  has  a  wide  spectral 
content,  is  smooth,  and  is  finite  in  duration.  The 
finite  duration  allows  actual  computation  run  times 
to  be  reduced  over  a  single  sinusoid  modulation. 
Also,  the  FDTD  algorithm  is  not  compatible  with 
sources  that  are  not  smooth.  The  calculation  of 
is  complicated  by  the  frequency  dependent  so¬ 
lar  cell  materials  and  the  general  input  source  mod¬ 
ulation  requirement. 

Absorption  is  the  time  averaged  fraction  of  power 
absorbed  to  power  incident  which  is  given  by 


A  = 


jJPgbsoTbed) 
{P source) 


(5) 


and  the  time  average  power  is 

T  /2 

iP)  =  i[  Pit)dt  (6) 

^  J-T/2 


The  total  power  absorbed  within  the  solar  cell  vol¬ 
ume,  V,  is 


Pabsorbed{t)  =  a  [  E{f,t)-E{f,t)dv  (7) 

Jv 

and  the  total  incident  power  for  a  plane  wave  source 
is 

P source  (t)  =  J^\E{r,t)\‘^/T)ods  (8) 

where  tjo  =  377  Q  is  the  free  space  impedance  . 
Since  the  input  source  is  a  modulated  sinusoid,  di¬ 
rect  application  of  Equation  5  will  give  the  total 
fraction  of  light  absorbed  not  the  desired  fraction 
absorbed  at  the  center  frequency,  wq.  Fourier  Anal¬ 
ysis  is  used  to  calculate  the  desired  A(wo).  Specif¬ 
ically,  Parseval’s  Theorem  is  used  to  transform  the 
time  average  integrals  to  the  frequency  domain. 
Parseval’s  Theorem  for  a  general  function,  f{t),  is 


In  words,  the  power  contained  in  the  source, 
f(t),  due  to  a  single  harmonic  component,  loq,  is 
|F(jwo)p/27r.  F{juJo)  is  Fourier  transform  of  f{t) 
evaluated  at  wq.  It  is  a  now  simple  matter  to  decom¬ 
pose  the  total  absorption  into  power  due  to  each  in¬ 
dividual  harmonic  using  Parseval’s  Theorem  to  find 


^  ^UJE{F,i^„)edv 

fs\E{f,juJo)\^/Tiods 


(10) 


The  time  averages  are  transformed  from  integrals 
in  the  time-domain  into  a  multiplication  of  the 
Fourier  transformed  electric  fields  in  the  frequency- 
domain.  A(wo)  in  Equation  10  is  one  measure 
of  light-trapping.  The  maximum  achievable  cur¬ 
rent  density  (MACD)  is  another  measure  of  light¬ 
trapping  which  takes  into  account  the  solar  spec¬ 
trum.  The  MACD,  Jsc,  is  defined  by 


J,,  =  q  F(X)A(X)  dX  (11) 

Jo 

where  q  is  the  charge  of  an  electron,  F{X)  is  the  solar 
spectrum  (usually  AM1.5)  of  the  incident  light,  and 
A(A)  comes  from  the  solution  to  Maxwell’s  equa¬ 
tions.  Once  the  absorption  spectrum  is  known  (as¬ 
suming  ideal  internal  spectral  response) ,  the  MACD 
is  calculated  from  Equation  11.  The  ideal  internal 
spectral  response  makes  it  possible  to  study  light¬ 
trapping  without  reference  to  any  specific  semicon¬ 
ductor  material  used  in  the  solar  cell.  In  this  way, 
silicon  as  well  as  any  other  semiconductor  can  be 
studied  based  solely  on  optical  considerations. 

Both  the  absorption,  A(A),  and  the  MACD  are 
used  to  quantify  light-trapping.  In  the  strict  sense, 
absorption  is  the  pure  measure  of  light-trapping. 
Absorption  quantifies  how  the  incident  light  is  ab¬ 
sorbed  as  a  function  of  wavelength  but  ignores  the 
spectral  content  of  the  sunlight.  On  the  other  hand, 
the  MACD  integrates  the  effects  of  both  the  absorp¬ 
tion  and  solar  spectrum  into  a  single  quantity,  Jsc- 
The  MACD  gives  a  more  telling  characterization  of 
the  real  operational  solar  cell  than  absorption,  with¬ 
out  limiting  itself  to  any  particular  semiconductor. 

G.  Limitations  and  Capabilities 

The  primary  limitation  of  the  FDTD  model  of 
light- trapping  is  due  to  limited  computer  resources. 
For  example,  a  modest  silicon  solar  cell  which  is 
1  pm  thick  with  a  periodic  boundary  has  a  total 
volume  of  l^m^.  The  total  number  of  unit  cells 
is  160^.  The  total  memory  requirement  is  approx¬ 
imately  188  megabytes.  Today’s  standard  personal 
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computer  has  16  megabytes  of  memory,  much  less 
than  required  by  the  model.  The  FDTD  model  falls 
short  of  the  ray-trace  model  in  this  regard. 

The  FDTD  model  of  light-trapping  assumes 
the  geometric  features  of  the  solar  cell  are  peri¬ 
odic  which  reduces  computer  resource  requirements. 
The  solar  cell  is  described  by  multiple  layers  of 
dielectrics.  Each  dielectric  has  a  complex  surface 
structure,  i.e.  perpendicular  slats,  pyramid,  tilted- 
pyramid  ...  or  any  surface  described  by  the  func¬ 
tion,  K{x,  y).  Multiple  layer  anti-reflection  coatings 
are  also  included  in  the  model.  Modern  solar  cell  de¬ 
signs  as  thin  as  0.5  /um  are  potentially  very  cheap  be¬ 
cause  of  their  low  mass  per  unit  of  performance  [25]. 
The  combination  of  the  FDTD  algorithm,  general 
texture  shapes,  and  anti-reflection  coatings  enables 
the  solar  cell  designer  to  model  and  design  thin  solar 
cells  based  on  light-trapping  more  accurately  than 
is  possible  with  any  ray-trace  model. 

III.  Full  Wave  Model  Light-Trapping 
Analysis 

In  this  section,  the  finite-difference  time-domain 
model  is  applied  to  the  light-trapping  analysis  of 
thin  silicon  solar  cells.  The  thickness  of  the  so¬ 
lar  cell  is  approximately  equal  to  the  wavelength 
of  light.  The  goal  of  the  FDTD  model  is  to  accu¬ 
rately  predict  absorption  in  thin  films.  The  cost  of 
accuracy  is  the  model’s  dependence  on  powerful  and 
expensive  computers.  Unlike  the  ray-trace  model, 
the  FDTD  model  is  not  able  to  execute  on  a  per¬ 
sonal  computer  with  only  one  megabyte  of  memory. 
All  the  light-trapping  analysis  presented  in  this  sec¬ 
tion  are  executed  on  Cray  supercomputers. 

A.  Thin  Cell 

Figure  3  compares  the  accuracy  of  the  FDTD  and 
ray-trace  models.  The  absorption  spectrum  is  cal¬ 
culated  for  a  thin,  0.75  /rm,  planar  solar  cell.  The 
ray-trace  model  calculates  the  average  absorption. 
The  FDTD  model  correctly  accounts  for  the  wave 
nature  of  light.  The  FDTD  model  successfully  pre¬ 
dicts  the  maxima  and  minima.  Figure  3  clearly 
demonstrates  the  FDTD  model  is  more  accurate 
than  the  ray-trace  model.  Note,  the  analytic  so¬ 
lution  is  not  valid  for  textured  solar  cells. 

B.  The  Maximum  Achievable  Current  Density  and 
Light- Trapping 

The  light-trapping  capability  of  a  thin  silicon  so¬ 
lar  cell  is  explored  in  this  subsection.  By  varying 
the  front  and  rear  surface  textures  of  a  0.75 /xm 
thick  solar  cell,  the  light-trapping  capability  of  some 


Fig.  3.  The  absorption  of  light  by  a  0.75 /xm  untextured 
solar  cell. 

promising  designs  is  determined  [1],  [3],  [4],  [5],  [6], 
[7],  [8],  [9].  Figure  4  shows  the  geometric  layout 
of  five  solar  cells  analyzed  with  the  FDTD  model. 
The  height  of  all  surface  textures  is  a  quarter  of 
the  thickness  of  the  cell  (h  =  t/4).  The  peak  an¬ 
gle  of  all  surface  textures  is  70.4°.  All  the  cells  are 
illuminated  by  AM1.5  sunlight.  Each  cell  in  Fig¬ 
ure  4  is  0.75  pm  thick.  For  the  planar  solar  cell,  the 
0.75  pm  thickness  is  the  distance  between  the  front 
and  rear  surface  of  the  silicon  slab.  For  the  perpen¬ 
dicular  slat  solar  cell,  the  definition  of  thickness  is 
not  obvious.  Figure  4  has  two  parallel  dotted  lines 
drawn  thru  each  cell.  The  distance  between  the  two 
lines  is  the  thickness.  The  thickness  for  the  per¬ 
pendicular  slat  solar  cell  is  not  simply  the  distance 
between  the  front  and  rear  surface  peaks.  Thick¬ 
ness  is  defined  such  that  different  solar  cells  with 
equal  cross  subsectional  area,  w^,  and  equal  thick¬ 
ness,  t,  have  equal  mass.  This  definition  of  thick¬ 
ness  ensures  a  valid  comparison  between  textured 
and  non-textured  solar  cells  always  exists.  Mass  is 
directly  proportional  to  volume.  Thickness  is  equiv¬ 
alently  defined  using  volume.  Two  solar  cells  that 
are  both  t  microns  thick  have  the  same  volume  per 
cross  subsectional  area.  For  the  planar  solar  cell, 
the  volume  of  the  unit  cell  is  tw^.  The  thickness 
for  the  planar  solar  cell  is  t  =  v/w^.  In  general, 
thickness  is  defined  as  t  =  v/w^.  The  volume  and 
cross  subsectional  area  are  the  same  for  each  of  the 
cells  in  Figure  4.  The  ratio  between  volume  and 
cross  subsectional  area  remains  constant  such  that 
t  =  v/'uP'  =  0.75  jum.  Figure  5  summarizes  the  re¬ 
sults  of  the  FDTD  light-trapping  analysis.  All  of 
the  textured  0.75  pm  solar  cells  increase  the  absorp- 
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Planar  Perpendicular  Slat  Corrugated  Pyramid  Pyramid 

Fig.  4.  The  thin  solar  cells  analyzed  with  the  FDTD  model  (  t  =  0.75  h  —  t/4  and  d  =  70.4°  ). 


TABLE  I 

The  MACD  of  different  0.75  /tm  thick  solar  cell 

DESIGNS. 


Design 

MACD  (mA/cm") 

Planar 

10.4 

Corrugated 

15.8 

Inverted  Pyramids 

16.9 

Perpendicular  Slats 

17.3 

Pyramids 

18.1 

tion  compared  to  the  planar  solar  cell.  The  corru¬ 
gated  slat  and  the  inverted  pyramid  designs  do  not 
perform  as  well  as  the  perpendicular  slat  design. 
The  perpendicular  slat  and  pyramid  designs  achieve 
at  least  50%  more  absorption  than  the  planar  so¬ 
lar  cell  at  the  shorter  wavelengths.  The  maximum 
achievable  current  density  (MACD)  under  AMI. 5 
illumination  for  each  of  the  0.75  fim  designs  is  tab¬ 
ulated  in  Table  I.  The  perpendicular  slat  design 
has  a  MACD  which  is  66%  greater  than  the  planar 
solar  cell.  The  ray-trace  model  predicts  a  12%  im¬ 
provement  when  the  perpendicular  slat  and  planar 
solar  cells  are  50  fim  thick.  The  effect  of  texturing 
is  more  important  as  the  solar  cells  thickness  de¬ 
creases.  The  solar  cell  with  the  front  pyramid  sur¬ 
face  texture  has  the  highest  MACD  of  18.1  mA/cm^. 
This  is  contrary  to  the  ray-trace  prediction  that  the 
perpendicular  slat  solar  cell  has  the  higher  MACD. 

C.  Where  are  the  Photons  Absorbed? 

The  ray-trace  model  [11]  shows  that  the  MACD  is 
not  the  best  figure  of  merit  for  light-trapping.  Based 


solely  on  the  MACD  and  the  ray-trace  model,  the 
best  light-trapping  design  is  a  thick  solar  cell  with 
a  front  surface  texture.  The  MACD  makes  the  false 
assumption  that  every  absorbed  photon  contributes 
to  the  net  current  of  a  solar  cell.  On  the  contrary, 
the  net  current  depends  on  where  the  photons  are 
absorbed  and  the  quality  of  the  semiconductor. 

What  is  the  generation  rate  for  a  thin  textured 
solar  cell?  The  generation  rate,  G{r,u),  must  sat¬ 
isfy 

F{u)A{lo)=  f  Gif,u)dv  (12) 

Jv 

where  F{u;)  is  the  solar  spectrum  and  the  total  ab¬ 
sorption,  A{uj),  is  given  by  Equation  10.  Substitu¬ 
tion  of  Equation  10  into  Equation  12  leads  to  the 
generation  rate  of 


G{f,uj)  =  F{u)) 


cr\E{r,ju;)\^ 

Is  \E{r,ju)\^/T}ods 


(13) 


In  words,  the  generation  rate  is  directly  propor¬ 
tional  to  the  absorption  density.  Prom  Equa¬ 
tion  13  and  Equation  10,  the  absorption  density  is 
G(r,w)/F(w).  The  absorption  density  (the  fraction 
of  incident  light  absorbed  per  unit  volume)  is  cal¬ 
culated  directly  by  the  FDTD  model. 

Even  though  it  is  beyond  the  scope  of  this  re¬ 
search  to  use  the  absorption  density  to  solve  the 
continuity  equation,  valuable  information  is  gained 
by  looking  at  the  absorption  density. 

Figure  6  shows  the  calculated  absorption  density 
in  the  xz-  and  yz-planes  for  the  planar  solar  cell. 
The  wavelength  of  light  is  0.35  fj,m.  The  extinction 
coefficient  of  silicon  at  this  wavelength  is  large  which 
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(a)  Perpendicular  Slats 


(b)  Corrugated  Slats 


Wavelength  (m) 


(c)  Pyramids 


(d)  Inverted  Pyramids 


Fig.  5.  The  light-trapping  capability  of  different  0.75  /rm  thick  solar  cell  designs. 


leads  to  the  rapid  exponential  decay  of  absorption 
seen  in  Figure  6.  Figure  7  shows  the  absorption  den¬ 
sity  for  the  same  solar  cell  except  the  wavelength 
of  light  is  0.70  ^m.  The  low  extinction  coefficient 
leads  to  a  uniform  average  absorption  density.  The 
absorption  density  at  the  front  and  rear  sides  of  the 
solar  cell  is  the  same.  The  FDTD  results  shows 
variations  in  the  absorption  density  caused  by  the 
interference  of  light.  Figure  8  illustrates  the  loca¬ 
tion  of  the  xz-  and  yz-planes. 


The  calculated  absorption  density  for  the  perpen¬ 
dicular  slat  solar  cell  is  shown  in  Figure  9.  The 
wavelength  of  light  is  0.35  ^m.  Unlike  the  planar 
solar  cell,  the  absorption  density  has  an  intricate 
profile.  Figure  9  shows  most  of  the  light  is  absorbed 
near  the  front  surface.  The  similar  effect  occurs  in 
the  planar  cell.  Figure  9  shows  a  sharp  peak  in  the 
absorption  density  at  the  peak  of  the  front  surface 
texture.  Figure  10  shows  the  absorption  density  for 
the  perpendicular  slat  solar  cell  at  a  wavelength  of 
0.70  /xm.  Light  reaches  the  back  surface  of  the  solar 
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(a)  Absorption  density  in  the  xz-plane 


(b)  Absorption  density  in  the  yz-plane 


Fig.  7.  The  absorption  density  for  the  0.75  nm  planar  solar  cell  with  A  =  0.70  /rm. 


Fig.  8.  An  illustration  of  the  xz-  and  yz-planes. 

cell.  Unexpectedly,  the  back  side  of  the  solar  cell  has 
the  highest  absorption  density.  The  high  absorption 
density  at  the  rear  surface  is  not  predicted  by  the 


ray-trace  model. 

D.  Summary  of  Results 

The  FDTD  model  is  more  accurate  than  the  ray- 
trace  model  for  thin  solar  cells  as  shown  in  Figure  3. 
Given  a  solar  cell  design,  the  FDTD  model  calcu¬ 
lates  the  total  absorption,  the  maximum  achievable 
current  density,  and  the  absorption  density.  Based 
on  these  output  parameters,  different  solar  cell  de¬ 
signs  are  simulated  and  compared  using  the  FDTD 
model.  The  FDTD  model  indirectly  calculates  the 
generation  rate.  The  generation  rate  is  the  product 
of  the  absorption  density  and  the  solar  spectrum. 


(a)  Absorption  density  in  the  xz-plane  (b)  Absorption  density  in  the  yz-plane 

Fig.  9.  The  absorption  density  for  the  0.75  pm  perpendicular  slat  solar  cell  design  with  A  =  0.35  pm. 


Fig.  10.  The  absorption  density  for  the  0.75  pm  perpendiculcir  slat  solar  cell  design  with  A  =  0.70  pm. 


The  generation  rate  could  be  used  to  calculate  the 
expected  current  density  of  a  solar  cell  instead  of 
the  maximum  achievable  current  density. 

Due  to  the  size  and  complexity  of  the  FDTD 
model,  only  a  few  select  solar  cell  designs  are  an¬ 
alyzed  based  on  their  light-trapping  capacity.  The 
MACD  for  the  0.75 /urn  thick  pyramid  solar  cell  is 
18.1mA/cm^.  The  MACD  for  the  0.75 pm  thick 
perpendicular  slat  solar  cell  is  17.3mA/cm^.  The 
ray-trace  model  predicts  the  perpendicular  slat  so¬ 
lar  cell  is  better  at  light  trapping  than  the  pyra¬ 
mid  design.  It  is  surprising  to  see  the  FDTD  model 
predict  that  the  0.75  pm  thick,  pyramid  design  is 


better.  The  front  pyramid  texture  increased  the 
MACD  by  74%  as  compared  to  the  planar  solar  cell. 
The  planar  solar  cell  has  a  MACD  of  10.4mA/cm^. 
For  thick  solar  cells,  greater  than  50  pm  thick,  the 
similar  increase  in  MACD  is  12%.  Texturing  be¬ 
comes  more  effective  at  increasing  light-trapping  as 
the  thickness  of  the  solar  cell  decreases. 

IV.  Discussion 

In  this  research,  two  computer  models  have  been 
developed  and  used  to  analyze  solar  cell  light¬ 
trapping.  The  characteristic  size  of  the  solar  cell 
determines  which  model  is  useful.  The  ray-trace 
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model  is  based  on  the  laws  of  geometrical  optics. 
For  this  reason,  the  ray-trace  model  is  limited  to 
thick  solar  cells  [11].  The  ray-trace  model  is  inaccu¬ 
rate  when  applied  to  the  thin  0.75  fim  thick  planar 
solar  cell.  The  ray-trace  model  ignores  the  wave  na¬ 
ture  of  light  and  interference  effects  within  the  solar 
cell.  The  Finite-Difference  Time-Domain  model  is 
based  on  solving  Maxwell’s  equations  directly.  The 
FDTD  model  accounts  for  the  wave  nature  of  light. 
The  FDTD  model  accurately  predicts  the  absorp¬ 
tion  spectrum  for  the  thin,  0.75  ^m,  planar  solar 
cell.  The  FDTD  model  is  applicable  to  thin  solar 
cells  where  the  characteristic  dimension  is  on  the 
order  of  the  wavelength  of  light. 

The  diversity  and  complexity  of  possible  solar  cell 
designs  based  on  light-trapping  creates  a  large  chal¬ 
lenge  to  the  engineer.  Textured  surfaces  and  anti¬ 
reflection  films  are  known  to  enhance  light-trapping. 
No  simple  analytical  solution  to  Maxwell’s  equa¬ 
tions  exists  for  the  diff'erent  types  of  proposed  so¬ 
lar  cell  designs.  This  problem  is  relatively  easy  to 
overcome  by  using  numerical  techniques.  Once  the 
numerical  model  is  translated  into  a  computer  lan¬ 
guage,  a  textured  solar  cell  is  as  easy  to  study  as  a 
planar  solar  cell.  Neither  the  ray-trace  model  nor 
the  FDTD  model  is  limited  to  planar  solar  cells. 
Both  models  are  capable  of  analyzing  the  light¬ 
trapping  characteristics  of  solar  cells  with  com¬ 
plicated  surface  textures  and  anti-reflection  films. 
Both  computer  models  are  designed  to  model  com¬ 
plex  light-trapping  designs.  The  models  also  handle 
dispersive  and  lossy  materials,  like  silicon. 

The  solar  cells  which  can  be  modeled  are  lim¬ 
ited  by  the  available  computer  resources.  The  peri¬ 
odic  boundary  was  introduced  to  partially  reduce 
some  of  these  computer  requirements.  The  ray- 
trace  model  was  designed  to  run  on  an  inexpensive 
personal  computer,  an  Intel-386  based  PC  with  one 
megabyte  of  memory.  The  typical  ray-trace  analy¬ 
sis  of  a  solar  cell  requires  half  a  megabyte  of  mem¬ 
ory  and  about  three  hours  of  runtime.  However, 
some  runs  took  as  long  as  a  week  to  analyze  a  sin¬ 
gle  solar  cell.  The  length  of  time  was  dependent 
on  how  many  rays  in  the  incident  beam,  the  num¬ 
ber  of  textures,  and  the  number  of  wavelengths.  To 
model  dispersion  in  silicon,  a  entire  run  has  to  be 
run  for  each  wavelength  of  incident  light.  The  in¬ 
dex  of  refraction  and  extinction  coefficient  are  de¬ 
pendent  on  wavelength.  The  FDTD  model  was  de¬ 
signed  to  improve  accuracy.  One  consequence  of 
improved  accuracy  is  the  FDTD  model  cannot  run 
on  a  personal  computer.  The  biggest  limiting  factor 
for  the  FDTD  model  is  the  computer  memory  re¬ 


quirements.  Depending  on  the  wavelength  of  light, 
the  FDTD  analysis  of  the  0.75  /xm  thick  solar  cell  re¬ 
quired  770  megabytes  at  the  wavelength  of  0.35  jum. 
This  large  memory  requirement  is  due  to  the  large 
index  of  refraction  (5.442)  for  silicon.  At  the  high¬ 
est  wavelength  of  2  fim,  the  memory  requirement  is 
only  20  megabytes. 

A  planar  solar  cell  0.75  ^tm  thick  has  a  maxi¬ 
mum  achievable  current  density  of  lOmA/cm^.  The 
MACD  is  increased  by  74%  to  18.1  mA/cm^  by  tex¬ 
turing  the  front  surface  of  the  solar  cell  with  pyra¬ 
mids.  The  perpendicular  slat  solar  cell  configura¬ 
tion  increased  the  MACD  by  only  66%.  The  ab¬ 
sorption  density  in  the  thin  solar  cell  is  more  com¬ 
plicated  than  the  standard  exponential  decay  seen 
in  thick  planar  solar  cells.  The  absorption  density 
in  a  textured  solar  cell  can  be  larger  at  the  back 
surface  than  at  the  front  surface.  Surface  recombi¬ 
nation  may  be  most  important  at  the  back  side  of 
a  thin  solar  cell.  Overall,  the  FDTD  model  demon¬ 
strates  texturing  has  a  large  effect  on  light-trapping 
in  thin  solar  cells. 

A.  Topics  for  Future  Research 

There  are  several  areas  of  light-trapping  which 
are  of  interest  to  the  solar  cell  community  and  re¬ 
quire  further  investigation.  FDTD  analysis  was  ver¬ 
ified  for  planar  solar  cells,  however,  the  surprising 
result  of  the  perpendicular  slat  solar  cell  FDTD 
analysis  necessitates  that  experimental  verification 
be  done  to  confirm  the  modeling  technique  for  tex¬ 
tured  surfaces.  Future  investigation  could  also  in¬ 
clude  improving  the  FDTD  model  to  analyzing  new 
light-trapping  designs.  The  ability  to  accurately 
model  thin  solar  cells  brings  new  avenues  to  study 
andenhance  light-trapping. 
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Abstract 

In  this  paper  extensions  to  the  impedance  network  method 
are  presented.  In  particular  the  method  has  been  applied  to 
problems  with  boundaries  extending  to  infinity.  The 
infinite  boundary  condition  can  also  be  applied  to  lines  of 
symmetry  in  the  given  geometry.  Two  dimensional 
surface  models  have  been  verified  by  comparison  of 
numerical  and  experimental  results  in  which  the  potential 
was  measured  along  the  edge  of  copper,  sheeting  of  various 
shapes  located  in  a  uniform,  quasi-static  magnetic  field. 
The  method  has  potential  for  modelling  three  dimensional 
structures  including  anisotropic  earth  planes,  arbitrarily 
shaped  buried  objects,  and  both  finite  and  infinitely  long 
faults,  dykes,  pipes,  cylinders  and  cracks. 

1.  Introduction 

The  three  dimensional  impedance  method  has  been  used  to 
model  eddy  currents  induced  in  heterogeneous  human 
models  by  a  time  varying  quasi-static  magnetic  field  [1-4]. 
In  this  work,  the  object  was  modelled  by  a  uniform  cubic 
three  dimensional  mesh  of  impedance  cells.  The  values  of 
the  impedances  were  determined  from  the  size  of  the 
element  and  the  conductivity  of  the  material  being 
modelled.  Using  standard  circuit  analysis  the  current  was 
solved  for  each  face  of  every  cell. 

Other  recent  approaches  to  the  calculation  of  eddy  currents 
using  numerical  techniques  include  an  integral  formulation 
using  a  set  of  R,  L  and  C  elements  [5],  the  boundary 
element  formulation  [6],  finite  element  modelling  [7], 
edge  element  [8],  the  integral  equation  approach  [9]  and 
many  others.  There  are  considerable  difficulties  in 
modelling  the  eddy  currents  induced  in  conductive  media 
when  bo±  the  source  field  is  non-uniform  and  the  media 
is  both  inhomogeneous  and  has  arbitrary  shape  or  is  of 
infinite  extent  in  one  or  more  directions.  In  this  paper  the 
impedance  method  [1]  is  extended  to  address  these 
limitations.  By  discretising  the  two  dimensional  area  or 
three  dimensional  volume  in  terms  of  a  spatial  array  of 
impedance  elements  various  structures  can  be  represented. 
The  individual  parameters  of  each  element  and  spatial 
variations  in  the  applied  field  can  also  be  modelled. 
Varying  symmetries  in  the  solution  space  together  with 
prudent  application  of  boundary  conditions  reduces  the 
number  of  elements  required  to  solve  a  given  problem. 
The  resultant  matrix  generated  from  this  technique  is  used 
to  solve  for  the  current  in  each  cell  for  a  spatially  varying 
time  harmonic  magnetic  field.  The  technique  is  applied  to 


the  solution  of  eddy  current  problems  in  materials  with  a 
spatial  variation  in  conductivity  in  the  presence  of  a  quasi¬ 
static  magnetic  field. 

Until  now  calculations  using  this  method  have  only  dealt 
with  finite  sized  bodies  with  air  boundaries.  This  excludes 
the  possibility  of  modelling  a  large  range  of  problems 
where  boundaries  extend  to  infinity.  For  this  reason  an 
infinite  boundary  condition  is  desirable. 

H.  Impedance  method  theory 

Maxwell's  Equation  in  differential  form  (1)  describes  the 
electric  field  E  generated  by  the  presence  of  a  time  varying 

magnetic  field  B. 

V  X  E  =  -  (1) 

at 

With  a  surface  area  S  enclosed  by  the  boundary  path  C, 
one  can  apply  Stoke's  Theorem  to  equation  (1)  and  obtain 


For  the  quasi-static  case  the  left  hand  side  of  equation  (2) 
represents  the  induced  potential  in  the  loop  V,  ie. 

V  =  ^  E.dl  (3) 

c  ~  ~ 

and  the  right  hand  side  of  equation  (2)  is  the  negative  of 
the  time  derivative  of  flux  <5  crossing  the  surface  (S), 
where 

<[)  =  J  B .  ds  (4) 

S  ~  ~ 

For  the  time  harmonic  case  we  have  the  relation 
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(8) 


at 


=  I  E.dl  =  -  jcoO 

c  ~  ~ 


(5) 


where  co  is  the  angular  frequency  of  the  magnetic  field. 


By  dividing  the  media  into  a  rectangular  mesh  of  discrete 
impedance  elements  Z,  we  can  apply  Stoke's  theorem  to 
each  cell  individually  providing  that  each  cell  is 
sufficiently  small  so  that  the  applied  magnetic  field  is 
uniform  within  its  boundary.  The  value  of  the  impedance 
on  each  side  of  the  cell  is  directly  related  to  the  length  of 
the  side  and  the  complex  dielectric  constant  of  the  cell. 
The  magnitude  of  the  inducing  magnetic  flux  is  directly 
related  to  the  area  enclosed  by  the  cell  and  the  angle 
between  the  S  and  B .  For  a  two  dimensional  surface 

formulation  the  values  of  Z  are  expressed  in  terms  of  the 
cell  length  L,  an  arbitrary  impedance  element  transverse 
thickness  A  (usually  expressed  as  an  area  in  the  three 
dimensional  model  although  it  is  the  thickness  in  the  two 
dimensional  formulation)  by  the  following  relation  [3]. 


(a  +  jcoerEo)  -  A 


where  a  is  the  conductivity  and  er  is  the  relative 
permittivity  of  the  material  and  Eq  is  the  dielectric 
constant  of  free  space. 

For  a  good  conductor  (a  »  coErEo)  and  using  a  quasi¬ 
static  assumption,  equation  (6)  becomes 


4 

j  COOi  =  ^Vrj 

j=l 

where  Vrj  is  the  voltage  across  the  resistor  in  the 
boundary  path  surrounding  the  i^^  cell. 

From  (8)  using  Ohm's  Law  we  can  solve  for  the  current  in 
a  single  loop  Icj.  The  relation  then  becomes 

4 

j(D<Di= 

j=l 

where  Icj  =  cell  i  loop  current. 


The  full  model  consists  of  many  interconnected  cell  loops 
in  a  mesh.  A  two  dimensional  representation  is  shown  in 
Fig.  2 


We  restrict  the  formulation  to  non-magnetic  materials  in 
time  harmonic  fields.  A  single  rectangular  cell  is  shown 
in  Fig.  1(a),  together  with  the  electrical  analog  (b). 


Figure  1 :  An  arbitrarily  enclosed  region  of  space  and  its 
discretized  equivalent  cell. 


In  terms  of  the  discretization  process,  it  is  assumed  that 
the  magnetic  field  Bj  across  the  element  is  uniform  and  so 

the  flux  Oj  within  each  cell  is  directly  related  to  the  area 
of  the  cell  Sj.  For  an  arbitrary  four  sided  cell,  we  can  write 


Figure  2:  A  portion  of  the  two  dimensional  resistive 
element  mesh. 


Hence  the  relationship  between  a  one  cell  Icj  and  its 
nearest  neighbour  cell  currents  Icj  is 

4 

jcoOci=  ^{Ici-  Icj)-Rj  (10) 
j=l 

For  a  conductive  sheet  consisting  of  n  cells,  the  system 
can  be  represented  as 

nxn  •  [  I  ]  Ixn  ~  Ixn  (H) 
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where  Rij  forms  the  resistance  network,  I  is  a  column 
matrix  of  unknown  currents  and  O  is  a  row  matrix  of 
magnetic  flux  through  each  element. 

From  (11)  it  is  possible  to  calculate  the  loop  current  in 
each  cell  and  from  this,  calculate  the  net  current 
distribution  throughout  the  solution  space.  Thus,  we  can 
determine  the  net  current  through  each  resistive  element  R 
by  summing  the  contributing  currents  from  adjoining 
cells.  For  the  two  dimensional  surface  formulation  there 
are  a  maximum  of  two  component  currents  per  resistive 
element,  however  for  a  three  dimensional  model  there  are 
potentially  four  component  currents  flowing  through  a 
resistive  element.  For  example,  if  the  resistive  element 
lies  in  the  x  direction  then  there  are  two  loop  currents  in 
the  XOY  plane  and  two  in  the  XOZ  plane  [4]. 

Note  that  the  applied  field  flux  [<E>]  can  be  spatially 
variable  for  each  element  and  so  non-uniform  fields  can  be 
modelled.  The  flux  generated  by  induced  currents  in  a 
particular  cell  will  be  in  the  opposite  direction  to  the 
applied  flux  and  extends  radially  from  the  cell.  If  we 
consider  the  four  immediately  adjacent  cells  then  to  a  first 
approximation  we  would  expect  less  than  25%  of  this  flux 
will  flow  through  each  adjacent  cell.  Since  the  field  falls 
away  as  r"^  contributions  to  other  cells  will  be  even  less 
significant.  Therefore  re-radiated  flux  generated  from  the 
induced  eddy  currents  is  not  considered  in  this  method. 
This  assumption  is  also  used  by  others  [1-4]. 

The  two  dimensional  surface  formulation  is  sufficient  to 
solve  many  problems  and  the  method  is  easily  adaptable 
to  three  dimensions  and  has  been  demonstrated  in  the 
literature  [1].  The  computation  of  <I>  as  it  appears  in 
equation  (1 1)  is  still  from  the  magnetic  flux  through  each 
cell  which  must  be  calculated  from  the  magnetic  field 
component  normal  to  the  surface  area  of  each  cellface. 


Boundary  conditions 

A  variety  of  boundary  conditions  are  required  for  the 
accurate  solution  of  models  of  arbitrary  shape  or  semi¬ 
infinite  size.  Application  of  the  appropriate  boundary 
conditions  can  reduce  the  number  of  cells  required  to  solve 
a  given  problem  and  hence  increase  the  computational 
efficiency.  This  allows  larger  problems  to  be  solved  on  a 
given  platform.  A  number  of  boundary  conditions  are 
discussed  below. 

(i)  Open  boundary/insulator  boundary 

In  the  case  of  an  open  boundary  where  a  conductor  meets 
an  insulator,  cells  lying  on  the  edge  do  not  allow  current 
to  flow  beyond  the  boundary,  ie.  there  are  no 
sources/sinks  of  current  beyond  the  boundary.  Hence  for 
an  open  boundary  the  contributing  current  Icj  in  equation 

(8)  is  zero.  Thus  any  element  at  this  boundary  is 
expressed  in  terms  of  its  four  bounding  resistances  and  up 
to  three  nearest  neighbouring  current  elements. 


(ii)  Infinite  boundary 

Where  a  uniform  material  extends  to  infinity  and  is 
subjected  to  a  uniform  magnetic  field  it  can  be  classified 
as  an  infinite  boundary.  At  infinity,  adjoining  cells  in  the 
direction  of  the  boundary  will  have  identical  currents.  If 
the  infinite  boundary  is  located  at  x  =  ,  then  it  follows 

that  -^  =  0  and  the  net  current  flowing  through  the 
dx 

element  on  the  boundary  of  these  two  cells  will  be  zero 
(the  summation  of  two  equal  and  opposite  component 
currents).  This  is  a  Neumann  boundary  condition. 
Equation  (10)  then  becomes  a  summation  of  potentials  j  = 
1..3.  The  location  of  the  infinite  boundary  is  chosen  such 
that  in  the  solution,  the  difference  in  current  between  cells 
adjacent  to  the  boundary  is  less  than  (say)  5%. 

With  3  or  more  infinite  boundaries  the  cell  currents  are  no 
longer  constrained.  This  is  reflected  by  an  ill  conditioned 
matrix  [R]  where  no  unique  solution  exists  because  the 
number  of  unknown  quantities  exceeds  the  number  of 
knowns  and  the  matrix  equation  cannot  be  solved. 

(iii)  Symmetry 

Along  a  line  (eg.  parallel  to  the  Y  axis)  or  plane  of 

symmetry  (eg.  in  the  YOZ-plane)  then  ^  =  0-  This  is 

also  a  Neumann  boundary  condition,  identical  to  the 
infinite  boundary  condition. 


Computational  aspects 

The  computational  model  was  developed  and  implemented 
on  a  Sun  Sparc  Server  10.  Mesh  generation  and 
processing  was  implemented  using  custom  software  in  'C 
with  double  precision  variables.  For  a  two  dimensional 
surface  formulation,  mesh  generation  yields  an  n  x  n 
matrix  for  an  n  element  problem.  The  matrix  generated 
has  a  sparse  diagonally  dominant  band  structure.  Because 
the  problems  exhibit  a  low  condition  number  an  exact 
rather  than  an  iterative  inversion  routine  was 
implemented.  Standard  LU  decomposition  as  described  in 
[10]  was  used.  Matlab™  was  used  to  create  the  net  current 
vectors  and  voltage  plots  from  the  numerically  obtained 
values  of  cell  loop  currents  [I]  in  equation  (11). 

Non-rectangular  cells 

Equation  (9)  can  be  modified  to  calculate  the  induced 
current  in  one  cell  in  terms  of  any  number  of  nearest 
neighbour  cells.  It  follows  from  this  equation  that  the 
number  of  sides  n,  of  a  given  cell  determines  the  exact 
form  of  this  equation  (j=l..n).  Triangular  elements  allow 
more  accurate  representation  of  many  structures  where 
normally  highly  detailed  staircase  approximations  are 
required. 
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Ill  The  numerical  results 


To  verify  the  method  in  two  dimensions,  a  uniform, 
square,  conductive  sheet  of  dimensions  9cm  x  9cm  was 
modelled  in  a  uniform  quasi  -static  magnetic  field.  The 
numerical  model  was  evaluated  for  many  cell  sizes.  The 
voltages  between  points  on  one  edge  of  the  sheet  were 
evaluated.  This  is  equivalent  to  the  spaced  pickup 
experiment  (described  below).  Note  that  a  cell  with 
dimensions  of  3cm  x  3cm  allows  the  solution  of  only  two 
data  points  whereas  a  cell  size  of  0.25cm  x  0.25cm 
produces  1 8  data  points.  These  results  are  shown  in  Fig. 
3.  It  should  be  noted  that  there  is  good  agreement 
between  the  numerical  results  obtained  for  all  cell  sizes 
although  the  spatial  resolution  is  obviously  limited  by  the 
cell  size.  The  error  for  even  the  largest  cell  size  is  less  that 
5%. 


Contact  seperation  distance  (cm) 

Figure  3:  Model  accuracy  for  a  variety  of  cell  sizes 
(f=lkHz,  5.8x10^ S/m). 

To  model  an  infinitely  long  strip  about  a  small  region  of 
interest  infinite  boundary  conditions  are  used.  The 
conductor/air  interface  is  an  open  boundary  since  the 
currents  are  confined  to  the  conductor  with  little  current 
injection  to/ffom  the  air.  Fig.  4  shows  an  18  x  18  cell 
model  of  1cm  x  1cm  cells  with  R  =  1  /m  with  two  open 
and  two  infinite  boundary  conditions  applied. 


IV  Experimental  setup 


Experimental  measurements  were  made  to  provide  a  direct 
comparison  with  results  from  the  numerical  model.  A 
split  shielded  loop  of  diameter  50cm  was  placed  around  the 
test  area  (Fig.  5).  The  loop  was  driven  with  a  signal 
generator  at  IkHz  to  create  a  uniform  magnetic  field 
normal  to  and  inside  the  loop.  A  typical  field  strength  of 
5)iT  was  generated.  Using  a  small  search  coil,  the 
magnetic  field  inside  the  area  bounded  by  the  coil  was 
found  to  be  constant  to  within  less  than  5%.  Copper  foil 
of  various  shapes  were  placed  in  this  region  normal  to  the 
magnetic  field. 


Figure  5:  Experimental  setup. 
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Figure  4:  Model  of  an  infinitely  long  conductive  strip. 


Voltage  probes  with  electrical  connections  to  the  edge  of 
the  foil  were  connected  to  a  detector  using  co-axial  cable. 
The  detector  circuit  had  an  input  impedance  of  10^ 

This  was  created  by  using  commercially  available  JFET 
operational  amplifiers  in  an  instrumentation  amplifier 
configuration.  Detector  electronics  was  housed  in  a  small 
aluminium  box  (5cm  x  7cm  x  10cm)  which  was  battery 
powered  with  a  LCD  display.  There  was  no  earth  reference 
to  the  detector.  The  electronics  consisted  of  a  band  pass 
filter  tuned  to  IkHz,  amplifier  stages,  an  RMS  to  DC 
detector  and  DMM  (digital  multimeter)  module.  The 
instrument  was  calibrated  by  varying  the  field  strength 
stepwise  and  recording  the  probe  output. 

The  objective  in  this  design  was  to  minimise  any  electric 
field  generation  from  the  coil  and  magnetic  or  electric  field 
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induction  in  the  detector  electronics.  When  the  earth 
connection  to  the  shield  of  the  excitation  coil  was 
removed,  the  signal  level  of  a  test  coil  placed  within  the 
test  area  showed  poor  directionality  indicating  electric  field 
coupling.  With  the  earth  connection  in  place,  the  detector 
output  was  zero  when  the  wire  test  coil  was  oriented 
parallel  to  the  magnetic  field. 


V  Results 


In  the  first  experiment  a  square  sheet  of  copper  foil  (9cm  x 
9cm)  was  placed  in  the  quasi-static  magnetic  field  such 
that  the  field  is  normal  to  the  copper  sheet.  This  problem 
was  solved  numerically  using  a  variety  of  cell  sizes.  With 
18  x  18  cells  the  numerical  solution  for  the  quasi-static 
vector  current  distribution  is  displayed  in  Fig.  6.  This 
current  distribution  shows  a  well  known  result  where  the 
induced  current  is  circular  and  the  greatest  current 
magnitudes  lie  on  the  edge  of  the  sheet.  This  was  verified 
by  experiment  by  monitoring  the  voltage  drop  across  a 
spaced  pickup  at  various  points  on  the  edge  of  the  sheet. 
This  current  plot  gives  good  agreement  with  a  result 
generated  using  edge  analysis  [8]. 


Figure  6:  Induced  current  distribution  in  a  homogenous 
conductive  square  (9cm  x  9cm)  cell  size  (0.5cm  x  0.5cm) 
calculated  using  the  impedance  method 


Spaced  pickup 

In  this  set  of  measurements,  separation  between  the 
measurement  points  was  varied  across  the  top  of  the  sheet 
while  remaining  symmetrical  about  the  position  x  = 
4.5cm.  The  numerical  model  results  are  normalised  to  the 
experimental  magnetic  field  strength  by  using  a  field 
strength  probe.  The  model  used  had  a  cell  size  of  0.25cm. 
The  model  results  were  obtained  by  numerically 
integrating  the  currents  between  the  ends  of  the  pickup  to 
obtain  a  measured  potential.  These  results  are  displayed  in 
Fig.  7.  The  difference  between  the  model  results  and  the 
experimentally  measured  data  is  less  than  10% 
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Figure  7:  Comparison  between  impedance  method  model 
results  and  experimental  results  for  a  spaced  pickup. 


Length  variations 

In  this  experiment  a  rectangular  (20cm  x  9cm)  sheet  of 
copper  foil  was  placed  in  a  quasi-static  magnetic  field  such 
that  the  field  is  normal  to  the  surface  of  the  copper  sheet. 
The  length  of  the  sheet  was  progressively  shortened  from 
20cm  to  9cm  in  0.5cm  increments.  The  voltage  difference 
was  always  measured  between  two  corner  points  on  one 
9cm  edge.  The  numerical  model  used  had  a  cell  size  of 
0.5cm  X  0.5cm.  Fig.  8  shows  that  the  normalised 
response  of  the  model  compares  well  with  the 
experimentally  measured  results.  Clearly  as  the  size  of  the 
sheet  decreases,  the  response  also  decreases.  There  is 
however,  a  limiting  sheet  length  (approximately  9cm) 
beyond  which  there  is  no  significant  increase  in  response. 

It  is  clear  that  beyond  a  length  of  9cm  there  is  very  little 
contribution  to  the  measured  voltage  from  additional 
copper  material;  ie.  for  short  lengths  the  current  is 
constrained  by  the  y  dimension  and  beyond  the  square,  the 
current  is  constrained  by  the  x  direction. 


Cell  shape  variability 

Two  shapes  were  modelled  experimentally  and 
numerically.  First  the  response  from  a  symmetrical 
detector  with  a  separation  distance  of  9cm  across  the  top 
of  a  9cm  x  9cm  square  sheet  of  copper  foil  was  measured. 
The  foil  was  then  cut  from  a  top  corner  across  the 
diagonal  to  a  lower  comer  to  form  a  triangular  shape.  The 
detector  position  remained  unchanged.  The  response  from 
this  triangular  shape  was  then  measured.  The  two  shapes 
were  modelled  using  a  square  and  triangular  one  element 
model  respectively.  The  induced  voltage  of  the  triangular 
shape  reduces  to  66%  of  the  square  shape  in  both  the 
numerical  and  experimental  results.  The  measured  and 
numerically  modelled  results  were  found  to  differ  by  0.5% 
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Figure  8:  Model  and  experiment  results  for  copper 
rectangle  length  variations. 


Infinitely  long  homogenous  conductive  strip 

In  Fig.  4  the  region  of  interest  is  divided  into  18  x  18 
cells.  Cells  on  the  top  and  bottom  of  this  model  are  open 
boundaries  and  the  cells  on  the  left  and  right  boundaries 
are  infinite  boundaries.  Fig.  9  shows  the  vector  currents 
obtained  by  the  impedance  method.  Note  that  the  boundary 
elements  are  included  in  columns  1  and  18  of  the  figure. 
The  current  through  the  model  is  parallel  to  the  open 
boundary  edges  indicating  that  the  infinite  boundary 
condition  is  effective. 


X(cm) 


Figure  9:  Induced  currents  in  an  infinitely  long  conductive 
strip. 


Figure  10:  A  model  of  a  surface  crack  in  an  infinitely  long 
conductive  strip. 

Surface  crack  modelling 

Fig.  10  shows  a  resistive  crack  (1x5  cells,  R  =  1  x 
104)  in  an  infinitely  long  homogenous  strip  (R=l  ). 
The  impedance  method  results  are  shown  in  Fig.  11.  The 
induced  currents  flow  around  the  resistive  crack  as 
expected. 


X(cm) 


Figure  1 1 :  Induced  currents  around  a  crack  in  an  infinitely 
long  conductive  strip. 

Three  dimensional  modelling 

The  impedance  method  can  be  used  to  model  three 
dimensional  bodies  in  the  presence  of  an  applied  magnetic 
field  [3].  Computation  time  for  a  three  dimensional 
model  is  dramatically  increased  (cubic  rather  than  squared 
relationship)  for  given  cell  dimensions. 

The  two  dimensional  surface  formulation  is  adequate  for 
the  two  dimensional  cross  section  modelling  of  three 
dimensional  structures  on  three  conditions: 

(a)  The  magnetic  flux  [  O  ]  is  unaffected  by  the  nearest 
adjacent  plane  perpendicular  to  the  direction  of  the  applied 
field.  (There  is  no  significant  magnetic  field  attenuation  if 
the  distance  is  less  than  10%  of  one  skin  depth). 


(b)  The  conductivity  distribution  in  the  adjacent  plane  is 
not  significantly  different  to  that  of  the  plane  being 
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modelled.  (There  will  be  negligible  cross  planar  currents 
generated  which  give  rise  to  additional  out  of  plane  current 
sources  and  sinks.) 

(c)  The  magnetic  field  is  always  directed  parallel  to  the 
surface  vector  of  each  cell.  (If  this  is  not  the  case  then 
additional  current  sources  and  sinks  will  be  present  within 
the  model.) 

VI  Conclusions 

The  impedance  method  can  be  used  to  solve  eddy  current 
problems  in  both  bounded  and  unbounded  regions.  This 
formulation  has  the  advantage  of  using  a  variety  of  cell 
shapes  (for  example  triangular  and  rectangular)  to  best 
represent  the  structure  in  the  model.  In  addition,  there  is 
no  requirement  for  a  uniform  mesh.  Different  sizes  and 
shapes  can  be  used  within  the  one  model.  The  impedance 
method  has  been  verified  using  copper  sheet  in  a  uniform 
magnetic  field.  The  experimental  measurements  lie  within 
10%  of  the  numerical  model  results  for  most 
measurements. 

By  applying  appropriate  boundary  conditions,  small 
regions  of  interest  within  large  structures  can  be  modelled. 
This  enables  large  structures  to  be  modelled  with  only  a 
small  number  of  elements  if  the  region  of  interest  is 
small.  This  technique  allows  for  significant  CPU  savings 
and  hence  greater  resolution  . 

Three  dimensional  modelling  has  great  potential  to  model 
complex  features.  If  the  features  of  interest  he  only  in  the 
plane  orthogonal  to  the  applied  field  then  a  two 
dimensional  model  is  sufficient  since  there  will  be  no 
interaction  between  the  layers.  The  method  has  potential 
for  the  non-destructive  detection  of  sub-surface  flaws  in 
metallic  regions  eg.  aircraft  wings  and  generator  housings. 
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3-1-1  Tsushima,  Okayama  700,  Japan 


Abstract:  In  order  to  investigate  methods  for  analyzing 
electromagnetic  fields  and  to  compare  the  accuracy  and 
the  CPU  time  of  various  codes  and  so  on,  investigation 
committees  were  set  up  in  lEE  of  Japan.  In  this  paper, 
the  activities  of  various  investigation  committees  relating 
electromagnetic  field  analysis  are  described  from  the 
viewpoint  of  the  verification  of  software. 

1.  INTRODUCTION 

In  the  Institute  of  Electrical  Engineers  (lEE)  of  Japan, 
twelve  investigation  committees  for  analyzing 
electromagnetic  fields  have  been  established  from  1977  as 
shown  in  Table  1.  These  committees  are  composed  of 
20-30  members  from  universities,  institutes  and 
industries.  The  committee  has  a  meeting  every  month  or 


every  two  months  and  surveys  the  following  subjects,  fir 
example: 

(1)  recently  developed  methods  for  calculating 
electromagnetic  fields, 

(2)  validity  of  newly  developed  methods, 

(3)  new  application  areas  of  numerical  analysis  of 
electromagnetic  fields. 

Some  committees  have  proposed  models  in  order  to 
verify  various  numerical  methods.  Each  committee 
published  an  lEEJ  Technical  Report. 

In  this  paper,  verification  models  proposed  by  the 
committees  oflEE  of  Japan  and  some  results  reported  by 
committee  members  are  shown.  Results  of  calculation  cf 
magnetic  fields,  eddy  currents,  forces,  torques,  optimized 
shapes  and  measurement  carried  out  by  the  author  are  also 


Table  1  Investigation  committees  in  lEE  of  Japan 


no. 

period  * 

name 

number  of  members 

univ. 

inst. 

indust. 

1 

1977  -  1980 

Investigation  Committee  on  Electromagnetic  Field  Analysis  of  Electric 
Power  Machines  Using  Finite  Element  Method 

8 

D 

10 

2 

1980  -  1984 

Investigation  Committee  on  Applications  of  Numerical  Method  for 
Analyzing  Electric  Power  Machines 

11 

B 

12 

3 

1984  -  1987 

Investigation  Committee  on  3-D  Calculation  of  Electromagnetic  Fields 

12 

B 

10 

4 

1987  -  1990 

Investigation  Committee  on  Numerical  Analysis  of  Eddy  Current 

10 

B 

14 

5 

Investigation  Committee  on  Applications  of  Numerical  Method  for 
Analyzing  Magnetic  Fields  in  Rotating  Machines 

14 

0 

12 

6 

1990  -  1993 

Investigation  Committee  on  Techniques  for  Applications  of  3-D 
Electromagnetic  Field  Analysis 

12 

B 

17 

7 

Investigation  Committee  on  Software  for  Numerical  Analysis  of 
Magnetic  Fields  in  Rotating  Machines 

10 

0 

14 

8 

Investigation  Committee  on  Highly  Accurate  Simulation  Technique 
for  Rotating  Machines 

8 

0 

17 

9 

1993  -  1996 

Investigation  Committee  on  Electromagpietic  Field  Analysis  and  Its 
Applications  to  Optimization  Problems 

11 

3 

14 

Investigation  Committee  on  Techniques  for  Applications  of 
Electromagnetic  Field  Analysis  for  Rotating  Machines 

10 

0 

16 

11 

Investigation  Committee  on  Highly  Advanced  Optimization  Technique 
for  Electromagnetic  Problems 

13 

2 

16 

12 

1997  -  1999 

Investigation  Committee  on  Techniques  of  Electromagnetic  Field 
Analysis  for  Virtual  Engineering  of  Rotating  Machines 

B 

17 

*  :  The  committees  start  in  April  and  end  in  March. 
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discussed. 

2.  MAGNETOSTATIC  AND  EDDY  CURRENT 
MODELS  (1984-1990) 

A.  1984-1987 

The  “Investigation  Committee  on  3-D  Calculation  cf 
Electromagnetic  Fields”  (1984-1987,  Chairman:  T. 
Nakata,  Okayama  Univ.)  proposed  a  simple  3-D 
magnetostatic  model[l]  as  shown  in  Fig.  1,  so  that  many 
universities,  institutes  and  industries  can  join  the  project. 
The  rectangular  open  core  is  surrounded  by  a  d.c. 
exciting  winding  with  457  turns.  The  d.c.  current  is 
equal  to  6.56A.  A  magnetic  shield  box  of  which  the 
thickness  t  is  1.6mm  covers  the  model.  The  relative 
permeabilities  of  the  core  and  the  shield  are  assumed  to 
be  1000. 

Fig.2  shows  the  calculated  flux  distribution.  Fig.3  shows 
the  absolute  value  of  flux  density  above  the  core 
(z=l  10mm).  Fig.4  shows  the  effects  of  the  thickness  t  and 
the  relative  permeability  |is  of  the  shield  box  on  the  flux 
density  at  the  point  of  x=y=0,  2=1 10mm.  When  t  is 
increased,  the  flux  density  is  not  changed  with  t.  The 
change  of  flux  density  by  the  thickness  t  is  small  when  jis 
is  increased.  The  effect  of  the  gauge  condition  of  the  A- 
method  on  the  obtained  result  is  discussed[3].  The 
cancellation  error  of  the  T-W  method [4]  and  the  error  cf 
the  boundary  element  method[5]  are  discussed. 

B.  1987-1990 

The  “Investigation  Committee  on  Numerical  analysis  cf 
Eddy  Current”  (1987-1990,  Chairman:  T.  Onuki, 
Waseda  Univ.)  proposed  a  3-D  eddy  current  model[6]  as 


shown  in  Fig.5.  A  rectangular  ferrite  core  is  surrounded 
by  an  exciting  coil.  An  ac  current  of  which  the  effective 
value  is  lOOOAT  at  the  frequency  of  50Hz  is  applied. 
Two  aluminum  plates  are  set  on  the  upper  and  lower 
sides  of  the  core.  The  conductivity  of  the  aluminum  is 
equal  to  3.215  x  10^  S/m,  and  the  relative  permeabihty 
(pr)  of  the  core  is  assumed  to  be  3000.  Both  cases  with 
and  without  a  hole  in  the  plate  are  investigated. 

Magnetic  field  noise  is  induced  at  the  jimction  of  the  Hall 
sensor  and  lead  wires,  and  this  noise  cannot  be  easily 
eliminated  by  simply  twisting  the  lead  wires.  Therefore, 
the  flux  density  is  measured  using  a  small  search  coil 


z 


Fig.2  Flux  distribution. 


Z 


Fig.3  Spatial  distribution  of  flux  density 
(with  shield,  y=47mm,  z=110nim). 
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with  20  turns  (mean  diameter;  3mm,  height:  0.6mm, 
conductor  diameter:  0.06mm).  The  eddy  current  density 
on  the  surface  of  the  aluminum  plate  is  measured  using  a 
modified  probe  method[26],  and  the  total  eddy  current  is 
measured  using  a  Rogowski  coil. 

Fig.6  shows  the  distributions  of  flux  density  vectors. 
Fig. 7  shows  the  maximum  absolute  value  |B  |  of  the  flux 


density  along  the  line  at  z=57.5mm[7].  The  discrepancies 
between  the  calculated  and  measured  values  are  quite 
small.  Fig.  8  shows  the  distributions  of  eddy  current 
density  vectors.  Fig.9  shows  the  x-  and  y-components  cf 
the  maximum  value  of  the  eddy  current  density  on  the 
surface  (z=65mm)  of  the  aluminum  plate.  The  results 
calculated  using  various  methods  and  elements  are  almost 
the  same.  The  discrepancies  between  the  calculated  and 
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aluminum  plate 
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(b)  plan  view 

Fig.  5  3-D  eddy  current  model. 
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(a)  without  hole  lb)  with  hole 

Fi^.6  Distributions  of  flu.x  density  vectors 
(y=0mm,  cflt=0“). 


Fig.7  Distributions  of  flux  density 
(z=57.5nun). 


y  y 


(a)  without  hole  (b)  with  hole 

Fig.  8  Distributions  of  eddy  current  density  vectors 
(z=65mm,  CDt=0°). 


Fig.9  Distribution  of  eddy  current  density 
(z=65mm). 
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measured  values  may  be  due  to  an  insufficient  number  cf 
elements,  setting  error  of  sensor,  etc.  Table  2  shows  the 
comparison  of  calculated  and  measured  values  of  total . 
eddy  current  h  passing  through  the  cross  section  a-b-c-d-a 
in  Fig.5.  The  error  e  is  defined  by 


e  = 


le  (cal)  -  le  (mea) 


le  (mea) 


xl00% 


(1) 


where  Ig  (cal)  is  the  current  calculated  and  Ig  (mea)  is  the 
current  measured. 

The  number  of  iterations  of  ICCG  (Incomplete  Cholesky 
Conjugate  Gradient)  method  for  calculating  large 
simultaneous  equations,  the  CPU  time,  etc.  are  shown  in 
Table  3.  The  CPU  time  of  the  T-fl  method  in  the 
analysis  of  the  model  without  hole  decreases  considerably 
compared  with  the  A-<^  method.  The  CPU  times  of  the 
edge  element  for  the  A-^  and  T-Q  methods  are  about  1/6 
and  1/2  of  the  nodal  element.  Although  the  CPU  time  cf 
the  T-f2  method  in  the  analysis  of  the  model  with  hole  is 
much  larger  compared  with  that  without  hole  in  the  case 
of  nodal  element,  it  is  not  so  remarkable  in  the  case  cf 
edge  element.  From  the  viewpoints  of  the  CPU  time,  the 
T-f2  method  with  edge  element  is  favorable. 


3.  MODELS  FOR  CALCULATING  FORCE  AND 
TORQUE  (1990-1995) 

A.  1990-1993 

The  “Investigation  Committee  on  Software  for  Numerical 
Analysis  of  Magnetic  Fields  in  Rotating  Machines” 
(1990-1993,  Chairman:  T.  Nakata,  Okayama  Univ.) 
proposed  four  kinds  of  models  which  are  related  to 
magnetic  field  analysis  of  rotating  machines[12]. 

Fig.  10  shows  a  3-D  model  for  the  verification  of  dc  force 
calculation[13,14].  The  center  pole  and  yoke  are  made  cf 
steel.  The  coil  has  381  turns  and  the  ampere-tums  (dc) 
are  chosen  to  be  1000,  3000,  4500  and  5000  in  order  to 
investigate  the  saturation  effect  Fig.  11  shows  the  z- 
component  F^  of  electromagnetic  force  calculated  using 
the  Maxwell  stress  tensor  method,  advanced  energy 
method  and  magnetizing  cirrrent  method.  The  number  cf 
elements,  ng,  is  equal  to  108864.  The  force  calculated  by 
the  Maxwell  stress  tensor  method  using  the  edge  element 
with  A  (magnetic  vector  potential)  variable  is  the  nearest 
to  the  measured  value.  The  rate  of  increase  of  the  force 
with  current  is  reduced  above  3000AT  due  to  the 
saturation  of  the  center  pole. 


A  new  formulation  of  the  A-<^  method[8],  a  hybrid  FE- 
BE  method[9],  the  boundary  element  method  using  edge 
elements[10]  and  the  finite  element  method  with  integral 
equation[l  1]  are  discussed. 


Fig.l2  shows  the  effect  of  number  of  elements,  ng,  on  the 
results  calculated.  Fig.  13  shows  the  initial  mesh 
(ng=4032).  Each  side  of  individual  elements  is 
subdivided  into  twice  (ng=4032  X  2^=32256)  and  thnce 
(ne=4032  X  3^=108864).  The  figure  suggests  that  the 
force  calculated  by  the  Maxwell  stress  tensor  method 


Table  2  Compmnson  of  eddy  cuiTent  (with  hole) 


item 

A- 

i 

-a 

measured 

nodal 

edge 

nodal 

edge 

amplitude  of  eddy 
current  llel  (A) 

449 

451 

450 

450 

444 

error  (%) 

1.13 

1.58 

1.35^ 

1.35 

Table  3  Discretization  data  and  CPU  time 


widiout  hole 

with 

hole 

item 

A  - 

T- 

-n 

A- 

-4, 

T- 

n 

nodal 

edge 

nodal 

edge 

nodal 

edge 

nodal 

edge 

number  of  elements 

j  144(X) 

number  of  nodes 

16275 

number  of  unknowns 

43417 

41060 

22844 

22412 

42885 

41060 

22844 

22412 

number  of 
non-zero  entries 

I78I644 

653718 

632859 

423056 

1734684 

653718 

632859 

423056 

computer  storage  (MB) 

72.2 

28.4 

30.7 

19.4 

70.5 

28.4 

30.7 

19.4 

number  of  iterations 
of  ICCG  method 

1306 

513 

172 

192 

1264 

582 

1141 

327 

CPU  time  (s) 

6242 

947 

533 

290 

5870 

1069 

2001 

442 

Computer  used  :  NEC  supercomputer  SX-IE 

(maximum  speed :  285  MFLOPS) 

convergence  criterion  of  ICCG  method  :  10 
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using  the  magnetic  vector  potential  (A)  converges  to  a 
constant  value  when  ng  is  nearly  equal  to  30000.  On  the 
contrary,  the  forces  calculated  by  the  advanced  energy 
method  and  the  magnetizing  current  method  change  with 
ng.  It  is  difBcult  to  make  a  mesh  which  has  extremely 
dense  and  sparse  parts,  unless  the  nonconforming 
clement[8]  is  used.  That  is  the  reason  why  the 
convergence  characteristic  is  so  poor  in  Fig.  12. 

Fig.  14  shows  the  2-D  model  for  verification  of  the  torque 
calculation[15].  The  stator  core  is  made  of  non-oriented 
silicon  steel  (AISI:  M-36).  The  rotor  is  composed  of  4 
poles.  The  rotor  shaft  is  made  of  carbon  steel.  The  rotor 
magnets  are  made  of  SmCos  (Br=0.9T,  Hc=7.0  x  10* 
A/m),  and  magnetized  in  parallel.  Twelve  participants 
solved  the  cogging  torque  and  seven  groups  measured  it. 


Z 


h — section  A-B — ^ 


Fig.lO  3-D  model  for  verification  of  force  calculation. 


■ - ■  Maxwell  stress  tensor  method 

A - a  magnetizing  current  method  >  ^  method 

advanced  energy  method  i  (  nodal  ) 
D — Q  Maxwell  stress  tensor  method 

- A  magnetizing  current  method 

o — -o  advanced  energy  method 
^ ^  measured 


A  method 
(  edge  ) 


Fig.  15  shows  the  waveforms  of  cogging  torque  calculated 
by  twelve  (A-L)  participants.  The  maximum  and 
minimum  waveforms  of  calculated  cogging  torque  and 
measured  waveform  are  compared  in  Fig.  16.  The  2-D 
finite  element  method  using  1st  order  triangular  elements 


10^  10^  10* 
(b)  5000AT 


■ - ■  Maxwell  stress  tensor  method) 

A - A  magnetizing  current  method  1  Ci  method  (nodal) 

• - •  advanced  energy  method  j 

D - D  Maxwell  stress  tensor  method  ) 


A - A  magnetizing  current  method  ) 

0 - O  advanced  energy  method  J 

CS - El  Maxwell  stress  tensor  method  ) 

A - A  magnetizing  current  method  j 

- measured 


A  method  (edge) 
A  method  (nodal) 


Fig.l2  Effect  of  number  of  elements  n^. 


Fig.ll  •  z-component  Fz  of  electromagnetic  force. 
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Fig.13  Initial  mesh  (ne=4032). 


is  used.  The  cogging  torque,  T,  is  calculated  using  the 
Maxwell  stress  tensor  method  and  the  advanced  energy 
method.  In  order  to  obtain  the  cogging  torque  as  a 
function  of  rotor  angular  displacement,  numerical  field 
solutions  are  obtained  for  different  rotor  positions. 

The  torque  is  calculated  for  two  kinds  of  mesh  patterns  as 
shown  in  Fig.  17.  Fig.  18  shows  the  effect  of  the  mesh 
pattern  on  the  torque  calculated.  The  figure  suggests  that 
the  results  obtained  by  the  Maxwell  stress  tensor  method 
are  affected  by  the  mesh  pattern.  On  the  contrary,  the 
results  obtained  by  the  advanced  energy  method  are 
scarcely  affected  by  the  mesh  pattern.  Fig.  19  shows  the 
effect  of  the  position  of  integration  path  in  the  air  gap  cf 
the  Maxwell  stress  tensor  method  on  the  calculated 
cogging  torque  imder  no  load.  The  result  obtained  for  the 
fine  mesh  (ne=  17352)  and  the  measured  result  are  also 
shown.  The  figure  suggests  that  the  accuracy  of  the  result 
for  the  integration  path  along  the  middle  contour 
(r=l  1.75mm,  r:  radius  from  the  center  of  rotor  shaft)  in 
the  air  gap  and  that  along  the  rotor  side  contour 
(r=l  1.55mm)  are  better  than  that  along  the  stator  side 
contour  (r=l  1.95mm).  This  is  because  the  . change  of  the 
flux  distribution  is  significant  near  the  teeth  of  the  stator. 

The  cogging  torque  calculated  using  3-D  analysis  is 
compared  with  that  using  2-D  analysis[16].  It  is  shown 
that  the  result  of  3-D  analysis  is  a  little  closer  to  the 
measured  result. 

Fig.20  shows  a  linear  motor  model[12].  The  field  is 
composed  of  yoke  (steel)  and  2  pole  ferrite  magnets.  The 

y 


y 


armature  core  is  made  of  steel.  The  number  of  turns  cf 
each  winding  is  50.  Fig.21  shows  the  comparison  cf 
calculated  and  measured  thrust  and  attractive  forces. 

Fig. 22  shows  a  model  of  salient-pole  synchronous 
machine[12].  The  number  of  poles  is  6.  The  core  is  made 
of  non-oriented  silicon  steel  (AISI:  M-47).  Fig.23  shows 
the  calculated  flux  distributions  in  the  air  gap. 


Fig.15  Cogging  torques  calculated  by  twelve 
participants. 


Fig.l4  2-D  model  for  verification  of  torque  calculation. 
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Fig.l  6  Maximum  and  minimum  calculated  cogging 
torques  and  measured  one. 


B.  1993-1995 


r=11.55  11.75  11.95 
(b)  mesh-2 

Fig.l7  Mesh  in  and  around  gap  (11^=4338). 


(a)  mesh-1 


In  the  “Investigation  Committee  on  Highly  Accurate 
Simulation  Technique  for  Rotating  Machines”  (1993- 
1995,  Chairman:  M.  Itoh,  Hitach  Ltd.),  the  torque  under 
load  and  the  load  current,  etc.  of  the  permanent  magnet 
motor  model  shown  in  Fig.l4  are  analyzed[17].  Fig.24 
shows  the  connection  of  stator  windings.  Fig.25  shows 
the  torque-speed  characteristics[18].  Torques  under 
various  excitation  conditions  are  analyzed[19,20]. 

4.  MODELS  FOR  COMPARING  OPTIMIZATION 
METHODS  (1993-1996) 

The  “Investigation  Committee  on  Electromagnetic  Field 
Analysis  and  Its  Application  to  Optimization  Problems” 
(1993-1996,  Chairman:  T.  Takuma,  Kyoto  Univ.) 
proposed  five  kinds  of  models  for  the  comparison  cf 
optimization  methods[21]. 

Fig.26  shows  a  model  of  die  press  with  electromagnet  fcr 
orientation  of  magnetic  powder[22].  This  is  used  for 
producing  anisotropic  permanent  magnets.  The  die  press 
is  made  of  steel.  The  die  molds  are  set  to  form  the  radial 


□  r=ll. 55  (rotor  side)  | 

-  r=ll. 75  (middle  of  gap)  r  ne=4338 

■  r=ll. 95  (stator  side)  J 

A  r=11.75.  ne=17352 
. measuerd 


Fig.18  Effect  of  subdivision  in  air  gap.  Fig.20  Linear  motor  model. 
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flux  distribution.  The  magnetic  powder  is  inserted  in  the 
cavity.  The  ampere-turns  of  each  coil  are  33.7kAT.  x- 
and  y-components  Bx  and  By  of  flux  density  at  the  points 
along  the  line  e-f  in  the  cavity  are  specified  as  follows: 

Bx  =  1.5008  6^1  ^2) 

By  =  1.5sin9(T)j 


a; 

O 

.o 


<u 

> 


current  (A) 


(b)  measured 


Fig.21  Thrust  and  attractive  force. 


Fig. 23  Flux  distribution  along  air  gap  (1300AT). 


Fig.25  Torque-speed  characteristics. 
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where  0  is  the  angle  measured  from  the  x-axis.  The 
following  four  kinds  of  optimization  methods  are  applied 
to  this  model. 

(a)  Physical  and  Engineering  Investigation  Method 
(PEM) 

The  shape  of  the  magnetic  circuit  is  examined  from 
the  physical  and  engineering  viewpoint  (Fig.27(a)). 
Finally  four  kinds  of  design  variables  (r]-r4)  are 
chosen  as  shown  in  Fig.27  (b)  and  the  final  result 
using  the  simulated  aimealing  method  is  shown  in 
Fig.27(c). 

(b)  Genetic  Algorithm  Method  (GAM) 

Fig.28  (a)  shows  the  initial  shape.  The  points  1-6 
are  moved  in  the  radial  direction,  whereas  the  points 
7-12  are  moved  in  the  x-direction.  4  bits  are 
specified  for  each  GA  node.  Fig.28  (b)  shows  the 
final  shape  obtained  after  190  generations. 

(c)  Rosenbrock’s  Method  (RBM)  and  Simulated 
Annealing  Method  (SAM) 

Fig.29  (a)  denotes  the  definition  of  design  variables, 
n-p  and  k-m  are  denoted  by  ellipses  whose  long  and 
short  axes  are  LI,  L2,  L3  and  L4.  Fig.29  (b)  shows 
the  final  shapes  obtained  using  RBM  and  SAM. 

Fig.  30  shows  the  comparison  of  the  amplitude  and 
direction  of  the  flux  density  vector  which  are  obtained 
using  four  kinds  of  methods.  The  advantage  and 
disadvantage  of  each  method  should  be  investigated  in 
the  future. 
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die  mold 


a-b-c-d;  Dirichlet  boundai-y 
d-a:  Neumann  boundaiy 


(a)  whole  view 
y 


(b)  enlarged  view 


Fig. 26  Model  of  die  press  with  electi'omagnet. 
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(a)  investigation  of  shape  of  magnetic  circuit 


(b)  choice  of  design  variables 


(c)  final  shape 

Fig.27  Optimization  by  physical  and  engineeiing 
investigation  method  (PEM). 


(a)  points  to  be  moved 


Fig.28  Optimization  by  genetic  algorithm  (GA). 


Fig.31  shows  a  superconducting  MRI  magnet  model[21]. 
The  uniform  flux  density  is  1.5T.  The  required 
rmiformity  in  the  center  sphere  (radius:  200mm)  is  less 
than  5ppm.  Fig.32  shows  the  obtained  result  using  the 
quasi-Newton  method  and  theoretical  equation  cf 
magnetic  field[23].  The  obtained  uniformity  is  0.4ppm. 

Fig.33  shows  the  axi-symmetric  electrode  model[21]. 
Fig. 34  shows  the  obtained  shape  of  the  electrode  which 


(a)  definition  of  design  valuables 
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MRI  magnet  model. 

(b)  final  shapes 
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Fig.29  Optimization  by  Rosenbrock’s  method  (RBM) 
and  simulated  annealing  method  (SAM). 
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Fig.32  Optimal  configuration 
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(b)  direction  of  flux 

Fig.30  Flux  density  and  direction  of  flux  at  final  shape. 
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Fig.33  Axi-symmetric  electrode  model. 


produces  the  uniform  electric  field  intensity [24].  The 
electric  field  is  calculated  using  the  charge  simulation 
method  (CSM)  and  the  surface  charge  method  (SCM). 

Fig. 35  shows  the  sphere  electrode  in  a  cube[21].  Fig.36 
shows  the  obtained  shape  of  the  electrode  which  produces 
the  uniform  electric  field  intensity[24]. 

Fig.37  shows  the  3-D  shielding  model[21].  The  ferrite 
plate  has  the  relative  permeability  of  1000  and  is 


r  (cm) 


Fig.34  Obtained  shape  of  axi-symmetric  electrode. 


Fig.35  Sphere  electrode  model. 


Fig.36  Obtained  shape  of  3-D  electrode. 


energized  by  39789AT.  Fig.38  shows  the  obtained  shape 
of  the  plate  having  various  partial  thickness  so  that  the 
maximum  value  of  the  flux  density  in  the  observation 
area  is  less  than  0.005T. 


Fig.37  3-D  magnetic  shield  model. 


Fig.38  Obtained  plate  shape. 


5.  CONCLUSIONS 

The  verification  models  proposed  by  the  investigation 
committees  of  lEE  of  Japan  and  some  results  are 
discussed.  Various  knowledge  related  to  electromagnetic 
field  calculation  was  obtained  by  using  these  models. 
The  detailed  discussion  is  written  in  each  paper.  Some 
models  (Fig.  10  and  the  modified  version  of  Fig.26)  woe 
also  adopted  as  TEAM  Workshop  problems  (Problems 
20  and  25  [25]). 

The  verification  of  the  following  problems  should  be 
investigated  in  the  future: 

(a)  modeling  of  material  properties,  such  as  anisotropy, 
hysteresis  and  superconductivity, 

(b)  coupled  problems  with  heat  and  fluid  flow, 

(c)  optimization  method  of  actual  machines. 

I  think  such  activities  of  the  verification  of  softwares 
using  various  models  shown  in  this  paper  made 
considerable  contribution  to  the  progress  of  the 
electromagnetic  field  calculation  in  Japan. 
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